You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jackrabbit.apache.org by Apache Wiki <wi...@apache.org> on 2005/06/09 05:34:14 UTC

[Jackrabbit Wiki] Update of "PersistenceManagerFAQ" by edgarpoce

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The following page has been changed by edgarpoce:
http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ

New page:
= PersistenceManager(PM) FAQ =
The responses were mainly gathered from the jackrabbit mailing list. 

=== What's a PM? ===
The PM is an *internal* Jackrabbit component that handle the persistent storage of content nodes and properties. Each workspace of a Jackrabbit content repository uses a separate persistence manager to store the content in that workspace. Also the Jackrabbit version handler uses a separate persistence manager. The PM sits at the very bottom layer in jackrabbits system architecture. 
Reliability, integrity and performance of the PM are *crucial* to the overall stability & performance of the repository. If e.g. the data that a PM is based upon is allowed to change through external means the integrity of the repository would be at risk (think of referential integrity / node references e.g.).

=== What's the PM responsibility? ===
The PM interface was never intended as being a general SPI that you could implement in order to integrate external datasources with proprietary formats (e.g. a customers database). the reason why we abstracted the PM interface was to leave room for future performance optimizations that  would not affect the rest of the implementation (e.g. by storing the raw data in a b-tree based database instead of individual file).

=== How smart should be a PM? ===
A PM should not be 'intelligent', it should not 'interpret' the data. The only thing it should care about is to efficiently, consistently and reliably store and read the data encapsulated in the passed nodeState & propertyState objects. Though it might be feasible to write a custom persistence manager to represent existing legacy data in a level-1 (read-only) repository, I don't think the same is possible for a level-2 repository and i certainly would not recommend it.

=== What about ORM-backed PMs? ===
Persistence managers that store the item states in a complex schema are not the right way to go. Keep it simple, e.g. the objectPersistenceManager stores the item states as a raw stream of bytes.

=== What combination of FS and PM is the best choice? ===
It depends on your priorities. If you want to store your data in an accessible format (just in case ;), you might want to try XML PM + localFileSystem. If you use windows and performance is a must, you might want to try objectPersistenceManager + cqfs.

=== Which are the current options? What are the status, pros and cons of each implementation? ===

=== objectPersistenceManager ===
 * Status: mature
 * Simple
 * Not human readable
 * An inconsistency is hard to fix without a tool
 * easy to configure
 * Write operations are synchronized 
 * if the jvm process is killed the repository might turn inconsistent
 * non transactional

=== xml persistenceManager ===
 * Status: mature
 * not so simple but human readable
 * easy to configure
 * Write operations are synchronized 
 * if the jvm process is killed the repository might turn inconsistent
 * non transactional

=== ORM persistenceManagers ===
 * Status: work in progress
 * Unnecessary complexity
 * transactional
 * rdbms referencial integrity (possible, but not implemented yet)
 * not so easy to configure.
 * Multithreaded friendly. Write operations don't need to be synchronized. 

=== localFileSystem: ===
 * Status: mature
 * Slow on window boxes

=== CQFS file system ===
 * Status: mature
 * Mysterious configuration options ;)
 * Mysterious proprietary binary format ;)
 * fast on windows
 * license issue, it's proprietary

Re: [Jackrabbit Wiki] Update of "PersistenceManagerFAQ" by edgarpoce

Posted by Serge Huber <sh...@jahia.com>.
Edgar Poce wrote:

>>If they are not to most people tastes, maybe we should just remove them ?
>>    
>>
>On the contrary, I'd like ORM PMs reach production status. I'd like to
>use a PM with transactional support, that's why I keep an eye on the
>status of your contribution.
>  
>
Ok thanks, so it seems I was over-reacting a bit. As I said I'm under 
some pressure right now and I still want to do my best to contribute to 
Jackrabbit.

>
>I better remove some lines from the wiki. And I ask you please to
>remove or add anything you consider, it's a wiki, right? ;)
>  
>
Thanks... It's fine now :)

cheers,
   Serge...

Re: [Jackrabbit Wiki] Update of "PersistenceManagerFAQ" by edgarpoce

Posted by Edgar Poce <ed...@gmail.com>.
Hi serge

On 6/9/05, Serge Huber <sh...@jahia.com> wrote:
> 
> Maybe it's me overreacting (I do have a lot of pressure at work right
> now) but I'm getting a lot of negative karma about the ORM-PMs... 
Sorry, it was not my intention.

> If they are not to most people tastes, maybe we should just remove them ?
On the contrary, I'd like ORM PMs reach production status. I'd like to
use a PM with transactional support, that's why I keep an eye on the
status of your contribution.

> 
> For me it was mostly a way to propose quickly a new back-end to
> Jackrabbit and get my hands dirty with the project. But if most people
> on the project feel that they "add unnecessary complexity", or "are not
> the right way to go", then maybe they should be removed ?
I better remove some lines from the wiki. And I ask you please to
remove or add anything you consider, it's a wiki, right? ;)

BR,
edgar
> 
> Regards,
>   Serge Huber.

Re: [Jackrabbit Wiki] Update of "PersistenceManagerFAQ" by edgarpoce

Posted by Serge Huber <sh...@jahia.com>.
Stefan Guggisberg wrote:

>don't worry, serge. i admit that i am not a huge fan of object relational 
>databases and sorts but that's also a question of personal taste i guess. 
>i don't have a problem with ORM PM in contrib and there seems to be
>interest in it.
>  
>
Ok thanks Stefan, I just wanted to get the "pulse" of the people here. I 
do agree that they shouldn't come out of contrib though, as they will be 
an "optional" setup.

Regards,
  Serge Huber.


Re: [Jackrabbit Wiki] Update of "PersistenceManagerFAQ" by edgarpoce

Posted by Stefan Guggisberg <st...@gmail.com>.
On 6/9/05, Serge Huber <sh...@jahia.com> wrote:
> 
> Maybe it's me overreacting (I do have a lot of pressure at work right
> now) but I'm getting a lot of negative karma about the ORM-PMs... If
> they are not to most people tastes, maybe we should just remove them ?
> 
> For me it was mostly a way to propose quickly a new back-end to
> Jackrabbit and get my hands dirty with the project. But if most people
> on the project feel that they "add unnecessary complexity", or "are not
> the right way to go", then maybe they should be removed ?

don't worry, serge. i admit that i am not a huge fan of object relational 
databases and sorts but that's also a question of personal taste i guess. 
i don't have a problem with ORM PM in contrib and there seems to be
interest in it.

cheers
stefan

> 
> Regards,
>   Serge Huber.
> 
> Apache Wiki wrote:
> 
> >Dear Wiki user,
> >
> >You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.
> >
> >The following page has been changed by edgarpoce:
> >http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ
> >
> >New page:
> >= PersistenceManager(PM) FAQ =
> >The responses were mainly gathered from the jackrabbit mailing list.
> >
> >=== What's a PM? ===
> >The PM is an *internal* Jackrabbit component that handle the persistent storage of content nodes and properties. Each workspace of a Jackrabbit content repository uses a separate persistence manager to store the content in that workspace. Also the Jackrabbit version handler uses a separate persistence manager. The PM sits at the very bottom layer in jackrabbits system architecture.
> >Reliability, integrity and performance of the PM are *crucial* to the overall stability & performance of the repository. If e.g. the data that a PM is based upon is allowed to change through external means the integrity of the repository would be at risk (think of referential integrity / node references e.g.).
> >
> >=== What's the PM responsibility? ===
> >The PM interface was never intended as being a general SPI that you could implement in order to integrate external datasources with proprietary formats (e.g. a customers database). the reason why we abstracted the PM interface was to leave room for future performance optimizations that  would not affect the rest of the implementation (e.g. by storing the raw data in a b-tree based database instead of individual file).
> >
> >=== How smart should be a PM? ===
> >A PM should not be 'intelligent', it should not 'interpret' the data. The only thing it should care about is to efficiently, consistently and reliably store and read the data encapsulated in the passed nodeState & propertyState objects. Though it might be feasible to write a custom persistence manager to represent existing legacy data in a level-1 (read-only) repository, I don't think the same is possible for a level-2 repository and i certainly would not recommend it.
> >
> >=== What about ORM-backed PMs? ===
> >Persistence managers that store the item states in a complex schema are not the right way to go. Keep it simple, e.g. the objectPersistenceManager stores the item states as a raw stream of bytes.
> >
> >=== What combination of FS and PM is the best choice? ===
> >It depends on your priorities. If you want to store your data in an accessible format (just in case ;), you might want to try XML PM + localFileSystem. If you use windows and performance is a must, you might want to try objectPersistenceManager + cqfs.
> >
> >=== Which are the current options? What are the status, pros and cons of each implementation? ===
> >
> >=== objectPersistenceManager ===
> > * Status: mature
> > * Simple
> > * Not human readable
> > * An inconsistency is hard to fix without a tool
> > * easy to configure
> > * Write operations are synchronized
> > * if the jvm process is killed the repository might turn inconsistent
> > * non transactional
> >
> >=== xml persistenceManager ===
> > * Status: mature
> > * not so simple but human readable
> > * easy to configure
> > * Write operations are synchronized
> > * if the jvm process is killed the repository might turn inconsistent
> > * non transactional
> >
> >=== ORM persistenceManagers ===
> > * Status: work in progress
> > * Unnecessary complexity
> > * transactional
> > * rdbms referencial integrity (possible, but not implemented yet)
> > * not so easy to configure.
> > * Multithreaded friendly. Write operations don't need to be synchronized.
> >
> >=== localFileSystem: ===
> > * Status: mature
> > * Slow on window boxes
> >
> >=== CQFS file system ===
> > * Status: mature
> > * Mysterious configuration options ;)
> > * Mysterious proprietary binary format ;)
> > * fast on windows
> > * license issue, it's proprietary
> >
> >
> >
> 
>

Re: [Jackrabbit Wiki] Update of "PersistenceManagerFAQ" by edgarpoce

Posted by Serge Huber <sh...@jahia.com>.
Maybe it's me overreacting (I do have a lot of pressure at work right 
now) but I'm getting a lot of negative karma about the ORM-PMs... If 
they are not to most people tastes, maybe we should just remove them ?

For me it was mostly a way to propose quickly a new back-end to 
Jackrabbit and get my hands dirty with the project. But if most people 
on the project feel that they "add unnecessary complexity", or "are not 
the right way to go", then maybe they should be removed ?

Regards,
  Serge Huber.

Apache Wiki wrote:

>Dear Wiki user,
>
>You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.
>
>The following page has been changed by edgarpoce:
>http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ
>
>New page:
>= PersistenceManager(PM) FAQ =
>The responses were mainly gathered from the jackrabbit mailing list. 
>
>=== What's a PM? ===
>The PM is an *internal* Jackrabbit component that handle the persistent storage of content nodes and properties. Each workspace of a Jackrabbit content repository uses a separate persistence manager to store the content in that workspace. Also the Jackrabbit version handler uses a separate persistence manager. The PM sits at the very bottom layer in jackrabbits system architecture. 
>Reliability, integrity and performance of the PM are *crucial* to the overall stability & performance of the repository. If e.g. the data that a PM is based upon is allowed to change through external means the integrity of the repository would be at risk (think of referential integrity / node references e.g.).
>
>=== What's the PM responsibility? ===
>The PM interface was never intended as being a general SPI that you could implement in order to integrate external datasources with proprietary formats (e.g. a customers database). the reason why we abstracted the PM interface was to leave room for future performance optimizations that  would not affect the rest of the implementation (e.g. by storing the raw data in a b-tree based database instead of individual file).
>
>=== How smart should be a PM? ===
>A PM should not be 'intelligent', it should not 'interpret' the data. The only thing it should care about is to efficiently, consistently and reliably store and read the data encapsulated in the passed nodeState & propertyState objects. Though it might be feasible to write a custom persistence manager to represent existing legacy data in a level-1 (read-only) repository, I don't think the same is possible for a level-2 repository and i certainly would not recommend it.
>
>=== What about ORM-backed PMs? ===
>Persistence managers that store the item states in a complex schema are not the right way to go. Keep it simple, e.g. the objectPersistenceManager stores the item states as a raw stream of bytes.
>
>=== What combination of FS and PM is the best choice? ===
>It depends on your priorities. If you want to store your data in an accessible format (just in case ;), you might want to try XML PM + localFileSystem. If you use windows and performance is a must, you might want to try objectPersistenceManager + cqfs.
>
>=== Which are the current options? What are the status, pros and cons of each implementation? ===
>
>=== objectPersistenceManager ===
> * Status: mature
> * Simple
> * Not human readable
> * An inconsistency is hard to fix without a tool
> * easy to configure
> * Write operations are synchronized 
> * if the jvm process is killed the repository might turn inconsistent
> * non transactional
>
>=== xml persistenceManager ===
> * Status: mature
> * not so simple but human readable
> * easy to configure
> * Write operations are synchronized 
> * if the jvm process is killed the repository might turn inconsistent
> * non transactional
>
>=== ORM persistenceManagers ===
> * Status: work in progress
> * Unnecessary complexity
> * transactional
> * rdbms referencial integrity (possible, but not implemented yet)
> * not so easy to configure.
> * Multithreaded friendly. Write operations don't need to be synchronized. 
>
>=== localFileSystem: ===
> * Status: mature
> * Slow on window boxes
>
>=== CQFS file system ===
> * Status: mature
> * Mysterious configuration options ;)
> * Mysterious proprietary binary format ;)
> * fast on windows
> * license issue, it's proprietary
>
>  
>


Re: [Jackrabbit Wiki] Update of "PersistenceManagerFAQ" by edgarpoce

Posted by Serge Huber <sh...@jahia.com>.
If I may, I think just changing the phrasing to something like :

Persistence managers that store the item states in a complex schema are not the best option for most users.

Regards,
  Serge Huber.

ps : my schema is not that complex :) And I will probably in later version adopt something similar to Edgar's JDBC schema (with one table per property type).


Stefan Guggisberg wrote:

>hi edgar
>first of all thanks for this very informative and usefull FAQ.
>i only have one comment:
>
>  
>
>>=== What about ORM-backed PMs? ===
>>Persistence managers that store the item states in a complex schema are not the right
>>way to go. Keep it simple, e.g. the objectPersistenceManager stores the item states as
>>a raw stream of bytes.
>>
>>    
>>
>i remember having said something like that in a previous thread. this
>is my very personal
>view and allthough i still think it is correct i'd rather like to have
>it understood as a recommendation, not an absolute statement. what do
>you think?
>
>thanks
>stefan
>
>  
>


Re: [Jackrabbit Wiki] Update of "PersistenceManagerFAQ" by edgarpoce

Posted by Stefan Guggisberg <st...@gmail.com>.
hi edgar
first of all thanks for this very informative and usefull FAQ.
i only have one comment:

> === What about ORM-backed PMs? ===
> Persistence managers that store the item states in a complex schema are not the right
> way to go. Keep it simple, e.g. the objectPersistenceManager stores the item states as
> a raw stream of bytes.
> 
i remember having said something like that in a previous thread. this
is my very personal
view and allthough i still think it is correct i'd rather like to have
it understood as a recommendation, not an absolute statement. what do
you think?

thanks
stefan