You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Vadim Gritsenko <va...@reverycodes.com> on 2005/08/31 18:33:20 UTC

Re: Is JDBC persistence manager supported by jackrabbit?

Edgar Poce wrote:
> Hi nafise
> 
>> is there any problem with this issue that discarded
>> for inclusion?!!!
>> I mean that,I think this is a realy necessary feature
>> having jdbc persistance manager with jackrabbit why
>> this issue is discarded?
> 
> It doesn't mean that any JDBPersistenceManager implementation is 
> discarded for inclusion, just the one proposed in JCR-91. You'll find 
> more detailed info in the archives[1] and in the wiki[2].

Edgar,

Was trying to find more information following your references, but...

> [1] http://thread.gmane.org/gmane.comp.apache.jackrabbit.devel/1435

Points to JIRA which states [1]:

    Comment by Edgar Poce [12/Jul/05 06:00 AM]
    This kind of approach is discouraged by design

Can you please clarify your point? Or, may be point to the document / discussion 
regarding the design?

> [2] http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ

Points to Wiki page which does not clarify your POV either. It states though:

    The PM interface was never intended as being a general SPI that
    you could implement in order to integrate external datasources
    with proprietary formats (e.g. a customers database).

This raises the question, what is the recommended SPI to code against?

PS Wiki page has incorrect statement:

     XML PersistenceManager
       * Write operations are synchronized

AFAICS, XML PM (unnecessarily) syncronizes all calls, including load() and 
exist() calls. Does it mean FileSystem interface considered to be single 
threaded? Does not make much sense, though...

Thanks,
Vadim

[1] http://issues.apache.org/jira/browse/JCR-91#action_12315534

Re: Is JDBC persistence manager supported by jackrabbit?

Posted by Vadim Gritsenko <va...@reverycodes.com>.

Marcel Reutegger wrote:
> Vadim Gritsenko wrote:
> 
>> Marcel Reutegger wrote:
>>
>>> Feel free to provide patches to enhance concurrency.
>>
>> My first patch than will be port of connection pools from Edgar's JDBC 
>> PM. Once DB PM has access to DB connection pool, there will be no need 
>> for any synchronizations. Would you accept it?
> 
> hmm, I might be wrong, but I don't think Edgar has implemented a DB 
> connection pool. Or are you refering to some pooling implementation in OJB?

He implemented connection factory and several implementations, providing ways to 
configure new/use existing connection pool.


> When submitting a patch please try to keep the newly introduced 
> dependencies to a minimum. We also prefer implementations that work with 
> a minimum of configuration overhead.

Noted.


>>> Some enhancements that crossed my mind are:
>>> - use a separate read-only connection for load() and exists() operations
>>> - use a pool of prepared statements for load() and exists()
>>
>> There are issues with single/double-connection design, beside the fact 
>> that (j2ee) applications are discouraged from managing system 
>> resources themselves:
> 
> Jackrabbit is not an j2ee application, but rather a resource itself.

:-)

Dunno about you but I envision that jackrabbit in 90% of situations will be used 
from within webapp (hence will have jndi, connection pool, and sometimes xa tx), 
1% in desktop apps ;-), and the rest are 'big iron' non-clustered deployments - 
the only way to scale it will be using bigger hardware, right? :-)


> Jackrabbit also runs without an application server and should therefore 
> not require j2ee infrastructure (though it may use it if available). 
> Which makes it a question of internal design how jackrabbit handles 
> resources.

Of course, and it should stay that way too. But if j2ee is present it should 
take advantage of that, right? :-)

Vadim

Re: Is JDBC persistence manager supported by jackrabbit?

Posted by Marcel Reutegger <ma...@gmx.net>.

Vadim Gritsenko wrote:
> Marcel Reutegger wrote:
>> Feel free to provide patches to enhance concurrency.
> 
> My first patch than will be port of connection pools from Edgar's JDBC 
> PM. Once DB PM has access to DB connection pool, there will be no need 
> for any synchronizations. Would you accept it?

hmm, I might be wrong, but I don't think Edgar has implemented a DB 
connection pool. Or are you refering to some pooling implementation in OJB?

When submitting a patch please try to keep the newly introduced 
dependencies to a minimum. We also prefer implementations that work with 
a minimum of configuration overhead.

>> Some enhancements that crossed my mind are:
>> - use a separate read-only connection for load() and exists() operations
>> - use a pool of prepared statements for load() and exists()
> 
> There are issues with single/double-connection design, beside the fact 
> that (j2ee) applications are discouraged from managing system resources 
> themselves:

Jackrabbit is not an j2ee application, but rather a resource itself. 
Jackrabbit also runs without an application server and should therefore 
not require j2ee infrastructure (though it may use it if available). 
Which makes it a question of internal design how jackrabbit handles 
resources.

regards
  marcel

Re: Is JDBC persistence manager supported by jackrabbit?

Posted by Vadim Gritsenko <va...@reverycodes.com>.

Marcel Reutegger wrote:
> Vadim Gritsenko wrote:
> 
>> Edgar Poce wrote:
>>
>>> when I decided to write the jdbc pm proposed in jcr-91 I wanted:
>>>
>>> 1 - a mature, transactional and scalable persistence storage
>>> 2 - use rdbms administrative tools, like scheduled backups, etc.
>>> 3 - rdbms referential integrity
>>> 4 - avoid redundancy. PMs store the NodeReferences twice.
>>> 5 - a storage that allows to modify the data easily, just in case.
>>
>> I need at least 1, 2, and clustering on top of that... None of 
>> existing PMs will work in cluster environment (OJB and Hibernate do 
>> not count).
> 
> Please note that clustering Jackrabbit is not just about the persistence 
> manager. It also involves many other areas that we need to take care of.

I know. But having transactional clustered PM will enable me to create a cluster 
of Level 1 repository instances to run them on app servers. Next step can be 
enabling flushing/synchronization of caches on those Level 1 instances. And 
after all that is done, full clustering (with distributed locking, etc) will be 
easier to tackle.

> See: http://issues.apache.org/jira/browse/JCR-169 for a starting point 
> on discussions about this topic.

Thanks for the pointer.

>> Why wait release? :-) Isn't code in contrib meant to be grounds for 
>> experimental code? :-) Let's bring it up before that - SimpleDB isn't 
>> usable as well:
>>
>>   * Synchronized to death
>>   * Stored BLOBs locally
> 
> 
> Feel free to provide patches to enhance concurrency.

My first patch than will be port of connection pools from Edgar's JDBC PM. Once 
DB PM has access to DB connection pool, there will be no need for any 
synchronizations. Would you accept it?

> Some enhancements that crossed my mind are:
> - use a separate read-only connection for load() and exists() operations
> - use a pool of prepared statements for load() and exists()

There are issues with single/double-connection design, beside the fact that 
(j2ee) applications are discouraged from managing system resources themselves:

   * No transaction isolation - which brings need for synchronizations
   * No keep-alive monitoring
   * No ability to reconnect severed connection

As for statement caching, IIRC driver does this.

> With those changes we can then loosen some of the synchronization.
> 
> BLOBs are stored locally because many DBs are known for their bad 
> performance when it comes to handling streams. So, speaking of 
> enhancements, introducing a configuration choice for BLOB handling is 
> probably another one.

Locally stored BLOBs might be Ok for non-clustered environment. It might be even 
Ok in some cluster deployments, if there is a replication mechanism.

But I don't think it is a good idea to replicate full set of BLOBs over each 
server (multiple times - if server runs more than one webapp) which happen to 
have a need to access the repository. I prefer having all BLOBs in one place, 
even if it is a bit slower...

Vadim

Re: Is JDBC persistence manager supported by jackrabbit?

Posted by Marcel Reutegger <ma...@gmx.net>.

Vadim Gritsenko wrote:
> Edgar Poce wrote:
>> when I decided to write the jdbc pm proposed in jcr-91 I wanted:
>>
>> 1 - a mature, transactional and scalable persistence storage
>> 2 - use rdbms administrative tools, like scheduled backups, etc.
>> 3 - rdbms referential integrity
>> 4 - avoid redundancy. PMs store the NodeReferences twice.
>> 5 - a storage that allows to modify the data easily, just in case.
> 
> I need at least 1, 2, and clustering on top of that... None of existing 
> PMs will work in cluster environment (OJB and Hibernate do not count).

Please note that clustering Jackrabbit is not just about the persistence 
manager. It also involves many other areas that we need to take care of.
See: http://issues.apache.org/jira/browse/JCR-169 for a starting point 
on discussions about this topic.

> Why wait release? :-) Isn't code in contrib meant to be grounds for 
> experimental code? :-) Let's bring it up before that - SimpleDB isn't 
> usable as well:
> 
>   * Synchronized to death
>   * Stored BLOBs locally

Feel free to provide patches to enhance concurrency. Some enhancements 
that crossed my mind are:
- use a separate read-only connection for load() and exists() operations
- use a pool of prepared statements for load() and exists()

With those changes we can then loosen some of the synchronization.

BLOBs are stored locally because many DBs are known for their bad 
performance when it comes to handling streams. So, speaking of 
enhancements, introducing a configuration choice for BLOB handling is 
probably another one.

regards
  marcel

Re: Is JDBC persistence manager supported by jackrabbit?

Posted by Vadim Gritsenko <va...@reverycodes.com>.

Edgar Poce wrote:
> Vadim Gritsenko wrote:
> 
>> But - this makes me wonder why OJB PM and Hibernate PM are considered 
>> to be acceptable design even though they are implemented exactly in 
>> the same way - they break up ItemState into parts!?
>
> who said orm pm is a best practice example?

But they are in SVN! :-) :-)

But I'll try SimpleDB approach first - I understand it is not going against 
Jackrabbit design philosophy.

Vadim

Re: Is JDBC persistence manager supported by jackrabbit?

Posted by Edgar Poce <ed...@gmail.com>.

Vadim Gritsenko wrote:
> Ok. Breaking ItemState into parts would tie storage layer to the 
> ItemState structure - which would make it impossible to make changes in 
> structure or hierarchy of ItemStates...
> 
> (Can one (potentially) have custom ItemState(s)?)
> 
I think it's not possible without modifying some core classes.

> But - this makes me wonder why OJB PM and Hibernate PM are considered to 
> be acceprable design even though they are implemented exactly in the 
> same way - they break up ItemState into parts!?
> 
who said orm pm is a best practice example?

>> I think that the jcr-ext project under contrib might be a good 
>> starting point. Or, despite the PM is not intended to be a SPI, you 
>> can handle to plug your legacy data if you do it carefully.
> 
> 
> Thanks for pointers. Do you suggest to use decorators? I don't see 
> though how they could be plugged in into the jackrabbit...

I'm not sure about this, actually I haven't used jcr-ext yet, but I 
think that it could be used as a base for a level 1 impl with legacy 
data indenpendently from jackrabbit. See o.a.j.base package.
Another option would be to plug your legacy data with a custom PM.

regards
edgar

> 
> Vadim
> 
>

Re: Is JDBC persistence manager supported by jackrabbit?

Posted by Vadim Gritsenko <va...@reverycodes.com>.

Edgar Poce wrote:
> Hi vadim
> 
> Vadim Gritsenko wrote:
> 
>> Was trying to find more information following your references, but...
>>
>>> [1] http://thread.gmane.org/gmane.comp.apache.jackrabbit.devel/1435
>>
>> Points to JIRA which states [1]:
>>
>>    Comment by Edgar Poce [12/Jul/05 06:00 AM]
>>    This kind of approach is discouraged by design
>>
>> Can you please clarify your point? 
> 
> There are a couple of conversations in the archive about this. My point 
> is that the PM contract is not suitable for mapping the itemstates into 
> a relational database with a table design that breaks the ItemState into 
> its constituent parts.

Ok. Breaking ItemState into parts would tie storage layer to the ItemState 
structure - which would make it impossible to make changes in structure or 
hierarchy of ItemStates...

(Can one (potentially) have custom ItemState(s)?)

But - this makes me wonder why OJB PM and Hibernate PM are considered to be 
acceprable design even though they are implemented exactly in the same way - 
they break up ItemState into parts!?


> The PM is intended to keep it simple, which means 
> to store the itemstate as a whole without interpreting the data. See the 
> jdbc pm under contrib.

Yep, saw that. And was puzzled by OJB/Hibernate.


> The main problem to store the itemstates in a complex schema is the 
> Collection handling. Since Collection fields changes are not logged into 
> add/update/remove aware objects, all the elements in the Collection must 
> be stored on each write call. It causes a hit on performance when 
> handling collections with lots of elements, even with the simple PMs 
> included in the core.

Saw it in storeChildNodeEntries - yep, it sure should be slow.


> see the second chart in http://issues.apache.org/jira/browse/JCR-188. In 
> my PIV box with Object PM + cqfs, any write operation (e.g. set a 
> property) takes up to half a sec when the given node reaches 3k children.
> If I tried to run the same test with the impl proposed in jcr-91, the 
> half sec mark would be reached much sooner than with 3k children, just a 
> hundred children would make the repo unbearably slow.
> 
> when I decided to write the jdbc pm proposed in jcr-91 I wanted:
> 
> 1 - a mature, transactional and scalable persistence storage
> 2 - use rdbms administrative tools, like scheduled backups, etc.
> 3 - rdbms referential integrity
> 4 - avoid redundancy. PMs store the NodeReferences twice.
> 5 - a storage that allows to modify the data easily, just in case.

I need at least 1, 2, and clustering on top of that... None of existing PMs will 
work in cluster environment (OJB and Hibernate do not count).


> But in order to achieve the above goals the PM should interpret the data 
> :(. Maybe we can bring this up again after the first release ...

Why wait release? :-) Isn't code in contrib meant to be grounds for experimental 
code? :-) Let's bring it up before that - SimpleDB isn't usable as well:

   * Synchronized to death
   * Stored BLOBs locally


>> Or, may be point to the document /
>> discussion regarding the design?
>
> Even when it's not directly related you might want to take a look to the 
> Dominique's post about jackrabbit internals. See 
> http://article.gmane.org/gmane.comp.apache.jackrabbit.devel/1223

I remember seeing this post some time ago :-)


>>> [2] http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ
>>
>> Points to Wiki page which does not clarify your POV either. 
> 
> It's not my point of view. I just collected the devs opinions on this 
> issue from the mailing list. If it's not clear please trace the 
> conversations in the archive and clarify it.

Tried to do that to no avail. Searches for 'JDBC', 'DB' do not give much.


>> It states though:
>>
>>    The PM interface was never intended as being a general SPI that
>>    you could implement in order to integrate external datasources
>>    with proprietary formats (e.g. a customers database).
>>
>> This raises the question, what is the recommended SPI to code against?
>
> I think that the jcr-ext project under contrib might be a good starting 
> point. Or, despite the PM is not intended to be a SPI, you can handle to 
> plug your legacy data if you do it carefully.

Thanks for pointers. Do you suggest to use decorators? I don't see though how 
they could be plugged in into the jackrabbit...

Vadim

Re: Is JDBC persistence manager supported by jackrabbit?

Posted by Vadim Gritsenko <va...@reverycodes.com>.

Marcel Reutegger wrote:
> Edgar Poce wrote:
> 
>> Vadim Gritsenko wrote:
>>
>>> PS Wiki page has incorrect statement:
>>>
>>>     XML PersistenceManager
>>>       * Write operations are synchronized
>>>
>>> AFAICS, XML PM (unnecessarily) syncronizes all calls, including 
>>> load() and exist() calls. 
>>
>> Why incorrect? maybe incomplete...
> 
> The current implementation of the XML PM serializes all calls to 
> store(), load() and exists(). This is because it operates on a 
> non-transactional store (a FileSystem implementation). The FileSystem 
> interface does not prevent dirty reads by its definition. Writing 
> changes to a FileSystem that involves multiple files therefore *must* 
> block reads, otherwise other sessions might see changes that are not yet 
> completely committed.
> 
> The crucial point is PersistenceManager.store() which states:
> 
> Atomically saves the given set of changes.
> 
> I agree, that's not extremely descriptive ;) but it actually describes 
> in one sentence what the PM has to guarantee.
> 
>>> Does it mean FileSystem interface considered to be
>>> single threaded? 
>>
>> I don't think so
> 
> No, the FileSystem interface does not specify any constraints on 
> concurrency. However each implementation will certainly contain some 
> synchronization in its internals. But that's something you don't have to 
> bother about when using a FileSystem.
> 
>>> Does not make much sense, though...
>>
>> I agree. I think that the concurrency issue was handled first at the 
>> SHISM level, then it was moved to the PM, and then back to the SHISM 
>> (see http://issues.apache.org/jira/browse/JCR-164). Those synchronized 
>> modifiers seem to be there because the PM contract is not very clear 
>> yet, at least for me :(.
> 
> basically what applies to a PM is also true for the 
> SharedItemStateManager (SHISM). But things are a bit more complex 
> because it involves an additional guarantee:
> 
> ItemState instances issued by the SHISM must be unique. The SHISM must 
> not return two distinct ItemState objects for the same ItemId!
> 
> But in the end, it's again the same contract as for a PM. Store 
> operations must be atomic.
> 
> The easist implementation is to use a read-write lock, which is 
> currently used in SHISM.
> 
> This is certainly not the perfect solution. e.g. two ChangeLogs which do 
> not intersect could be stored concurrently (if the PM is able to manage 
> this). Similarly reading ItemStates that do not conflict with a 
> ChangeLog that is currently stored should not be blocked.
> 
> hope this clarifies things a bit...

Yes, it does. Thanks a lot.

It also means though, that XML PM is not for production use by design - in it 
current state. Note to that effect would be appropriate in the Javadoc, I think.

Vadim

Re: Is JDBC persistence manager supported by jackrabbit?

Posted by Marcel Reutegger <ma...@gmx.net>.

Edgar Poce wrote:
> Vadim Gritsenko wrote:
>> PS Wiki page has incorrect statement:
>>
>>     XML PersistenceManager
>>       * Write operations are synchronized
>>
>> AFAICS, XML PM (unnecessarily) syncronizes all calls, including load() 
>> and exist() calls. 
> 
> Why incorrect? maybe incomplete...

The current implementation of the XML PM serializes all calls to 
store(), load() and exists(). This is because it operates on a 
non-transactional store (a FileSystem implementation). The FileSystem 
interface does not prevent dirty reads by its definition. Writing 
changes to a FileSystem that involves multiple files therefore *must* 
block reads, otherwise other sessions might see changes that are not yet 
completely committed.

The crucial point is PersistenceManager.store() which states:

Atomically saves the given set of changes.

I agree, that's not extremely descriptive ;) but it actually describes 
in one sentence what the PM has to guarantee.

>  > Does it mean FileSystem interface considered to be
> 
>> single threaded? 
> 
> I don't think so

No, the FileSystem interface does not specify any constraints on 
concurrency. However each implementation will certainly contain some 
synchronization in its internals. But that's something you don't have to 
bother about when using a FileSystem.

>  > Does not make much sense, though...
> 
>>
> I agree. I think that the concurrency issue was handled first at the 
> SHISM level, then it was moved to the PM, and then back to the SHISM 
> (see http://issues.apache.org/jira/browse/JCR-164). Those synchronized 
> modifiers seem to be there because the PM contract is not very clear 
> yet, at least for me :(.

basically what applies to a PM is also true for the 
SharedItemStateManager (SHISM). But things are a bit more complex 
because it involves an additional guarantee:

ItemState instances issued by the SHISM must be unique. The SHISM must 
not return two distinct ItemState objects for the same ItemId!

But in the end, it's again the same contract as for a PM. Store 
operations must be atomic.

The easist implementation is to use a read-write lock, which is 
currently used in SHISM.

This is certainly not the perfect solution. e.g. two ChangeLogs which do 
not intersect could be stored concurrently (if the PM is able to manage 
this). Similarly reading ItemStates that do not conflict with a 
ChangeLog that is currently stored should not be blocked.

hope this clarifies things a bit...

regards
  marcel

Re: Is JDBC persistence manager supported by jackrabbit?

Posted by Serge Huber <sh...@jahia.com>.

Edgar Poce wrote:

> Hi serge
>
> Serge Huber wrote:
>
>> I was away for 3 weeks at the army so sorry for the late reply.
>>
> It's probably my non native english skills betraying me but I'm curious,
> is it some kind of expression or are you fighting a war in your spare
> time? ;)

Sorry I wasn't precise enough. As I am Swiss, we have to go to mandatory 
weeks every year in the army. I just did 3 weeks of military service and 
I have a lot to catch up too now :)

> I didn't mean jcr-rmi could be used to scale jackrabbit horizontally. I
> meant that jcr-rmi could be used in a j2ee cluster scenario to access a
> single instance of jackrabbit, I'm thinking of something like ...
>
> Box1. App Server   Box2. App Server2
> jcr-rmi client      jcr-rmi client
> ----------------   ---------------
>         \                /
>          \              /
>          ----------------
>          BOX3. jackrabbit
>           jcr-rmi server

ok but this means BOX3 is a single point of failure box. Ideally I'm 
interested in scenarios where scalability could be assured horizontally, 
as well as transaction management.

Regards,
  Serge...

>
> best regards,
> edgar
>
>>
>> Regards,
>>  Serge Huber.
>>
>
>

Re: Is JDBC persistence manager supported by jackrabbit?

Posted by Edgar Poce <ed...@gmail.com>.

Hi serge

Serge Huber wrote:
> I was away for 3 weeks at the army so sorry for the late reply.
> 
It's probably my non native english skills betraying me but I'm curious,
is it some kind of expression or are you fighting a war in your spare
time? ;)

> The RMI-cluster solution is interesting, but I worry about 
> connection/disconnection problems. The full database implementation 
> causes performance problems, especially for binary data. Basically what 
> this means is that we are implementing some sort of clustered 
> file-system, that supports transactions and is as high-performance as 
> possible.
I didn't mean jcr-rmi could be used to scale jackrabbit horizontally. I
meant that jcr-rmi could be used in a j2ee cluster scenario to access a
single instance of jackrabbit, I'm thinking of something like ...

Box1. App Server   Box2. App Server2
jcr-rmi client      jcr-rmi client
----------------   ---------------
         \                /
          \              /
          ----------------
          BOX3. jackrabbit
           jcr-rmi server

best regards,
edgar

> 
> Regards,
>  Serge Huber.
>

Re: Is JDBC persistence manager supported by jackrabbit?

Posted by Serge Huber <sh...@jahia.com>.

Hi Edgar,

I was away for 3 weeks at the army so sorry for the late reply.

> The main problem to store the itemstates in a complex schema is the 
> Collection handling. Since Collection fields changes are not logged 
> into add/update/remove aware objects, all the elements in the 
> Collection must be stored on each write call. It causes a hit on 
> performance when handling collections with lots of elements, even with 
> the simple PMs included in the core.

Actually there is a way to do that, and that is why I had custom 
implementations of the NodeState and PropertyState, so that I could use 
add/update/remove aware objects. Both Hibernate and OJB do this 
differently, but if implemented correctly, you do not have to rewrite 
the whole collection all the time.

But I ran into trouble because I had to copy the data between the 
original item state and my internal objects. That is why the 
implementation is so complex. If I could have re-used the objects as-is 
I would have had this problem, but this way not possible because I 
needed to modify the collection implementations. Maybe there is a way to 
do this using aspects, but this would complicate things even further.

With the hindsight, for high-performance, transaction-aware and cluster 
compliant, there is no perfect solution. I don't really like the 
file-system BLOB solution because it causes problems with replication. 
The RMI-cluster solution is interesting, but I worry about 
connection/disconnection problems. The full database implementation 
causes performance problems, especially for binary data. Basically what 
this means is that we are implementing some sort of clustered 
file-system, that supports transactions and is as high-performance as 
possible.

Regards,
  Serge Huber.

Re: Is JDBC persistence manager supported by jackrabbit?

Posted by Edgar Poce <ed...@gmail.com>.

Hi vadim

Vadim Gritsenko wrote:
> Edgar,
> 
> Was trying to find more information following your references, but...
> 
>> [1] http://thread.gmane.org/gmane.comp.apache.jackrabbit.devel/1435
> 
> 
> Points to JIRA which states [1]:
> 
>    Comment by Edgar Poce [12/Jul/05 06:00 AM]
>    This kind of approach is discouraged by design
> 
> Can you please clarify your point? 

There are a couple of conversations in the archive about this. My point 
is that the PM contract is not suitable for mapping the itemstates into 
a relational database with a table design that breaks the ItemState into 
its constituent parts. The PM is intended to keep it simple, which means 
to store the itemstate as a whole without interpreting the data. See the 
jdbc pm under contrib.

The main problem to store the itemstates in a complex schema is the 
Collection handling. Since Collection fields changes are not logged into 
add/update/remove aware objects, all the elements in the Collection must 
be stored on each write call. It causes a hit on performance when 
handling collections with lots of elements, even with the simple PMs 
included in the core.

see the second chart in http://issues.apache.org/jira/browse/JCR-188. In 
my PIV box with Object PM + cqfs, any write operation (e.g. set a 
property) takes up to half a sec when the given node reaches 3k children.
If I tried to run the same test with the impl proposed in jcr-91, the 
half sec mark would be reached much sooner than with 3k children, just a 
hundred children would make the repo unbearably slow.

when I decided to write the jdbc pm proposed in jcr-91 I wanted:

1 - a mature, transactional and scalable persistence storage
2 - use rdbms administrative tools, like scheduled backups, etc.
3 - rdbms referential integrity
4 - avoid redundancy. PMs store the NodeReferences twice.
5 - a storage that allows to modify the data easily, just in case.

But in order to achieve the above goals the PM should interpret the data 
:(. Maybe we can bring this up again after the first release ...

 > Or, may be point to the document /
 > discussion regarding the design?
 >
Even when it's not directly related you might want to take a look to the 
Dominique's post about jackrabbit internals. See 
http://article.gmane.org/gmane.comp.apache.jackrabbit.devel/1223

>> [2] http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ
> 
> Points to Wiki page which does not clarify your POV either. 
It's not my point of view. I just collected the devs opinions on this 
issue from the mailing list. If it's not clear please trace the 
conversations in the archive and clarify it.

 > It states though:
> 
>    The PM interface was never intended as being a general SPI that
>    you could implement in order to integrate external datasources
>    with proprietary formats (e.g. a customers database).
> 
> This raises the question, what is the recommended SPI to code against?
> 
I think that the jcr-ext project under contrib might be a good starting 
point. Or, despite the PM is not intended to be a SPI, you can handle to 
plug your legacy data if you do it carefully.

> 
> PS Wiki page has incorrect statement:
> 
>     XML PersistenceManager
>       * Write operations are synchronized
> 
> AFAICS, XML PM (unnecessarily) syncronizes all calls, including load() 
> and exist() calls. 
Why incorrect? maybe incomplete...

 > Does it mean FileSystem interface considered to be
> single threaded? 
I don't think so

 > Does not make much sense, though...
> 
I agree. I think that the concurrency issue was handled first at the 
SHISM level, then it was moved to the PM, and then back to the SHISM 
(see http://issues.apache.org/jira/browse/JCR-164). Those synchronized 
modifiers seem to be there because the PM contract is not very clear 
yet, at least for me :(.

br,
edgar

> Thanks,
> Vadim
> 
> [1] http://issues.apache.org/jira/browse/JCR-91#action_12315534
> 
>