You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Jukka Zitting <ju...@gmail.com> on 2006/03/18 15:22:22 UTC

Thoughts on database persistence

Hi,

I just added JNDI/DataSource -based versions of the database
persistence manager and file system classes as requested in JCR-313.
The comments on the issue thread got me thinking about the current
"simple" approach and database configurability in general. The
database file system classes are pretty much mirrors of the
persistence manager counterparts, so I'll just focus on the PM classes
here, the same ideas apply to both situations.

The JCR-313 issue thread focused much on the question of
implementation "simplicity". After going throught the code I think the
question has more to do with the approach of keeping the database
connection throughout the PM lifecycle and caching prepared statements
for performance rather than any inherent simplicity of the
implementaion approach. The main point seems to be that the current
implementation wants to prepare the used statements once during
initialization rather than once per method call. This is somewhat in
conflict with the J2EE best practice of keeping a database connection
and related resources like prepared statements only for the duration
of a single operation.

Incidentally there happens to be one approach that would keep the
performance advantages of the current approach, remove the conflict
with J2EE practices, and even simplify the implementation! Like this:

1) Change the DatabasePersistenceManager to get a database connection
and prepare the used statements per each operation to comply with J2EE
practices.

2) Use the Commons DBCP DriverAdapterCPDS DataSource implementation
with PreparedStatement pooling in SimpleDbPersistenceManager to keep
the performance gains.

3) Remove the now unneeded Connection and PreparedStatement members
and resetStatement() method from the DatabasePersistenceManager class
to simplify the implementation.

The cost of this change would be a bit of pooling overhead per each
persistence manager operation (should be insignificang compared to the
cost of the database operations) and the introduction of commons-dbcp
and commons-pool as dependencies.

This change would also clarify that the responsibility of any extra
database shutdown operations like in the current
DerbyPersistenceManager rests on the subclass as the
DatabasePersistenceManager class would no longer keep any stable
reference to the underlying database.

What do you think? I can take a shot at implementing this if you think
it's worth doing.

BR,

Jukka Zitting

--
Yukatan - http://yukatan.fi/ - info@yukatan.fi
Software craftsmanship, JCR consulting, and Java development

Re: Thoughts on database persistence

Posted by Edgar Poce <ed...@gmail.com>.
On 3/18/06, Brian Moseley <bc...@osafoundation.org> wrote:
> On 3/18/06, Edgar Poce <ed...@gmail.com> wrote:
>
> > What's the benefits of using a jdbc based PM implementation?
> > Only the rdbms administrative stuff, scheduled backups, etc.
>
> "only" seems to diminish the vast importance of these things. if
> jackrabbit provided more in the way of management tools and backup
> capability, people might be less anxious to put a database underneath
> it.
>

You are right, I was thinking mainly in performance. But sure, things
like audit, backup and recovery are major issues.

Re: Thoughts on database persistence

Posted by Brian Moseley <bc...@osafoundation.org>.
On 3/18/06, Edgar Poce <ed...@gmail.com> wrote:

> What's the benefits of using a jdbc based PM implementation?
> Only the rdbms administrative stuff, scheduled backups, etc.

"only" seems to diminish the vast importance of these things. if
jackrabbit provided more in the way of management tools and backup
capability, people might be less anxious to put a database underneath
it.

Re: Thoughts on database persistence

Posted by Stefan Guggisberg <st...@gmail.com>.
hi edgar

On 3/18/06, Edgar Poce <ed...@gmail.com> wrote:
> Hi to all,
>
>  more thoughts on database persistence ...
>
>   It seems having a jdbc based persistence manager as the default
> implementation misleads users, new and not so new users often think
> that jackrabbit will benefit from rdbms features and analyze
> jackrabbit internals taking into account j2ee best practices.
>
>   Keeping simple the SimpleDBPersistenceManager is a good option not
> only for the sake of simplicity, but also because other approaches are
> discouraged due to design decisions. As Stefan pointed a few times
> jackrabbit is designed to stand in its own right. It means that it's
> not designed to leverage any persistence storage engine, rdbms
> included.
>
>   The fact derby is the default PM doesn't mean it's the best option,
> there's overhead related to sql parsing and too many unused features.
> It took me a while to understand it :), but I agree that for now the
> best option is a simple and transactional btree implementation, as
> Stefan has been pointing for a long time. Something like
> http://jdbm.sourceforge.net/ would probably be a better fit. Stefan,
> WDYT?. Is it worth to give it a try?

i've never took a closer look at jdbm but it's certainly something worth
investigating.

>
> Since questions about leveraging rdbms capabilities arises in the
> Mailing list all the time, in case the comments above have any sense,
> I suggest adding a few more entries to the faqs that make clear
> Jackrabbit is not just a layer on top of a rdbms. WDYT?

yes, i agree.

cheers
stefan

>
> e.g.
> ----
>
> I want to use jackrabbit in a j2ee environment and I want to use JNDi
> to configure jdbc connections, how can I do it?
> You can override the default implementation and get connections
> through JNDI, but take into account that using a rdbms in server mode
> is not the best option. Jackrabbit *is* a storage engine by itself.
>
> Does Jackrabbit leverage rdbms capabilities?
> No, all Jackrabbit needs from a PersistenceManager implementation is a
> simple transactional persistence mechanism that supports large
> collections. A simple btree implementation suffice.
>
> What's the benefits of using a jdbc based PM implementation?
> Only the rdbms administrative stuff, scheduled backups, etc.
>
> ---
>
> my 0,0002 cents, in case it worths that much ;)
> edgar
>
> ps, congratulations to all. you are all doing a great job!!
>
> On 3/18/06, Jukka Zitting <ju...@gmail.com> wrote:
> > Hi,
> >
> > On 3/18/06, Stefan Guggisberg <st...@gmail.com> wrote:
> > > 'Simple' also refers to use of a very simple data model instead of
> > > a fully normalized schema or some object-relational mapping.
> >
> > Agreed. A different data model would require a fully separate PM class
> > (like in the orm- or dbd- contribs). I believe the
> > SimpleDbPersistenceManager data model is good for the current needs
> > and pretty much orthogonal to the way the database connection is
> > handled.
> >
> > > those best practices apply to j2ee applications. the point is that i don't
> > > consider jackrabbit to be a j2ee application, jackrabbit is infrastructure
> > > and has other requirements regarding its persistence layer than a
> > > database application.
> >
> > Good point. In many cases Jackrabbit however lives in a J2EE
> > environment and, as expressed in JCR-313, there are legitimate needs
> > for using it within the constraints of existing database deployments.
> >
> > > note that write operations must occur within a single transaction, i.e.
> > > you can't get a new connection for every write operation.
> >
> > Ah, good point. That pretty much downs my proposal. So, withdrawn for now.
> >
> > BR,
> >
> > Jukka Zitting
> >
> > --
> > Yukatan - http://yukatan.fi/ - info@yukatan.fi
> > Software craftsmanship, JCR consulting, and Java development
> >
>

Re: Thoughts on database persistence

Posted by Edgar Poce <ed...@gmail.com>.
Hi to all,

 more thoughts on database persistence ...

  It seems having a jdbc based persistence manager as the default
implementation misleads users, new and not so new users often think
that jackrabbit will benefit from rdbms features and analyze
jackrabbit internals taking into account j2ee best practices.

  Keeping simple the SimpleDBPersistenceManager is a good option not
only for the sake of simplicity, but also because other approaches are
discouraged due to design decisions. As Stefan pointed a few times
jackrabbit is designed to stand in its own right. It means that it's
not designed to leverage any persistence storage engine, rdbms
included.

  The fact derby is the default PM doesn't mean it's the best option,
there's overhead related to sql parsing and too many unused features.
It took me a while to understand it :), but I agree that for now the
best option is a simple and transactional btree implementation, as
Stefan has been pointing for a long time. Something like
http://jdbm.sourceforge.net/ would probably be a better fit. Stefan,
WDYT?. Is it worth to give it a try?

Since questions about leveraging rdbms capabilities arises in the
Mailing list all the time, in case the comments above have any sense,
I suggest adding a few more entries to the faqs that make clear
Jackrabbit is not just a layer on top of a rdbms. WDYT?

e.g.
----

I want to use jackrabbit in a j2ee environment and I want to use JNDi
to configure jdbc connections, how can I do it?
You can override the default implementation and get connections
through JNDI, but take into account that using a rdbms in server mode
is not the best option. Jackrabbit *is* a storage engine by itself.

Does Jackrabbit leverage rdbms capabilities?
No, all Jackrabbit needs from a PersistenceManager implementation is a
simple transactional persistence mechanism that supports large
collections. A simple btree implementation suffice.

What's the benefits of using a jdbc based PM implementation?
Only the rdbms administrative stuff, scheduled backups, etc.

---

my 0,0002 cents, in case it worths that much ;)
edgar

ps, congratulations to all. you are all doing a great job!!

On 3/18/06, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On 3/18/06, Stefan Guggisberg <st...@gmail.com> wrote:
> > 'Simple' also refers to use of a very simple data model instead of
> > a fully normalized schema or some object-relational mapping.
>
> Agreed. A different data model would require a fully separate PM class
> (like in the orm- or dbd- contribs). I believe the
> SimpleDbPersistenceManager data model is good for the current needs
> and pretty much orthogonal to the way the database connection is
> handled.
>
> > those best practices apply to j2ee applications. the point is that i don't
> > consider jackrabbit to be a j2ee application, jackrabbit is infrastructure
> > and has other requirements regarding its persistence layer than a
> > database application.
>
> Good point. In many cases Jackrabbit however lives in a J2EE
> environment and, as expressed in JCR-313, there are legitimate needs
> for using it within the constraints of existing database deployments.
>
> > note that write operations must occur within a single transaction, i.e.
> > you can't get a new connection for every write operation.
>
> Ah, good point. That pretty much downs my proposal. So, withdrawn for now.
>
> BR,
>
> Jukka Zitting
>
> --
> Yukatan - http://yukatan.fi/ - info@yukatan.fi
> Software craftsmanship, JCR consulting, and Java development
>

Re: Thoughts on database persistence

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 3/18/06, Stefan Guggisberg <st...@gmail.com> wrote:
> 'Simple' also refers to use of a very simple data model instead of
> a fully normalized schema or some object-relational mapping.

Agreed. A different data model would require a fully separate PM class
(like in the orm- or dbd- contribs). I believe the
SimpleDbPersistenceManager data model is good for the current needs
and pretty much orthogonal to the way the database connection is
handled.

> those best practices apply to j2ee applications. the point is that i don't
> consider jackrabbit to be a j2ee application, jackrabbit is infrastructure
> and has other requirements regarding its persistence layer than a
> database application.

Good point. In many cases Jackrabbit however lives in a J2EE
environment and, as expressed in JCR-313, there are legitimate needs
for using it within the constraints of existing database deployments.

> note that write operations must occur within a single transaction, i.e.
> you can't get a new connection for every write operation.

Ah, good point. That pretty much downs my proposal. So, withdrawn for now.

BR,

Jukka Zitting

--
Yukatan - http://yukatan.fi/ - info@yukatan.fi
Software craftsmanship, JCR consulting, and Java development

Re: Thoughts on database persistence

Posted by Alexandru Popescu <th...@gmail.com>.
Hi!

I am one of those that brought this subject to the ml in the past (unfortunately, no so detailed as 
Jukka did).

I tend to agree with Jukka from the perspective of known j2ee best practices. Though, Stefan's 
points are more important from functionality and performance point of view. What looks really 
interesting is the fact that both ideas would work quite well together if:
- we can define a middle "persistence manager" layer that defines the needed atomic operations (so 
this fullfils the requirement that all writes take place in the same transaction)
- we look at the current persistence manager as JDBC-like single operation provider.

With the above in mind, one will be able to define:
- in the higher level: how the connection is handled
- in the current pm layer: how the implementation (real persistence access) is handled

I would probably need more knowledge of the current implementation details in order to be able to 
come out with a full proposal, but I really hope that the devs will hopefully understand what I am 
trying to say.

hope this is more than 2c :-)

./alex
--
.w( the_mindstorm )p.


#: Stefan Guggisberg changed the world a bit at a time by saying (astral date: 3/18/2006 6:11 PM) :#
> On 3/18/06, Jukka Zitting <ju...@gmail.com> wrote:
>> Hi,
>>
>> I just added JNDI/DataSource -based versions of the database
>> persistence manager and file system classes as requested in JCR-313.
>> The comments on the issue thread got me thinking about the current
>> "simple" approach and database configurability in general. The
>> database file system classes are pretty much mirrors of the
>> persistence manager counterparts, so I'll just focus on the PM classes
>> here, the same ideas apply to both situations.
>>
>> The JCR-313 issue thread focused much on the question of
>> implementation "simplicity". After going throught the code I think the
>> question has more to do with the approach of keeping the database
>> connection throughout the PM lifecycle and caching prepared statements
>> for performance rather than any inherent simplicity of the
>> implementaion approach.
> 
> 'Simple' also refers to use of a very simple data model instead of
> a fully normalized schema or some object-relational mapping.
> 
>> The main point seems to be that the current
>> implementation wants to prepare the used statements once during
>> initialization rather than once per method call. This is somewhat in
>> conflict with the J2EE best practice of keeping a database connection
>> and related resources like prepared statements only for the duration
>> of a single operation.
> 
> those best practices apply to j2ee applications. the point is that i don't
> consider jackrabbit to be a j2ee application, jackrabbit is infrastructure
> and has other requirements regarding its persistence layer than a
> database application.
> 
>>
>> Incidentally there happens to be one approach that would keep the
>> performance advantages of the current approach, remove the conflict
>> with J2EE practices, and even simplify the implementation! Like this:
>>
>> 1) Change the DatabasePersistenceManager to get a database connection
>> and prepare the used statements per each operation to comply with J2EE
>> practices.
> 
> note that write operations must occur within a single transaction, i.e.
> you can't get a new connection for every write operation.
> 
>>
>> 2) Use the Commons DBCP DriverAdapterCPDS DataSource implementation
>> with PreparedStatement pooling in SimpleDbPersistenceManager to keep
>> the performance gains.
>>
>> 3) Remove the now unneeded Connection and PreparedStatement members
>> and resetStatement() method from the DatabasePersistenceManager class
>> to simplify the implementation.
>>
>> The cost of this change would be a bit of pooling overhead per each
>> persistence manager operation (should be insignificang compared to the
>> cost of the database operations) and the introduction of commons-dbcp
>> and commons-pool as dependencies.
>>
>> This change would also clarify that the responsibility of any extra
>> database shutdown operations like in the current
>> DerbyPersistenceManager rests on the subclass as the
>> DatabasePersistenceManager class would no longer keep any stable
>> reference to the underlying database.
>>
>> What do you think? I can take a shot at implementing this if you think
>> it's worth doing.
> 
> -1 for changing SimpleDbPersistenceManager as suggested right now
> 
> on the other hand i have problem with adding a new more sophisticated
> db pm as suggested.
> 
> time and experience will tell if we want to keep them both or not.
> 
> cheers
> stefan
> 
>>
>> BR,
>>
>> Jukka Zitting
>>
>> --
>> Yukatan - http://yukatan.fi/ - info@yukatan.fi
>> Software craftsmanship, JCR consulting, and Java development
>>
> 


Re: Thoughts on database persistence

Posted by Stefan Guggisberg <st...@gmail.com>.
On 3/18/06, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> I just added JNDI/DataSource -based versions of the database
> persistence manager and file system classes as requested in JCR-313.
> The comments on the issue thread got me thinking about the current
> "simple" approach and database configurability in general. The
> database file system classes are pretty much mirrors of the
> persistence manager counterparts, so I'll just focus on the PM classes
> here, the same ideas apply to both situations.
>
> The JCR-313 issue thread focused much on the question of
> implementation "simplicity". After going throught the code I think the
> question has more to do with the approach of keeping the database
> connection throughout the PM lifecycle and caching prepared statements
> for performance rather than any inherent simplicity of the
> implementaion approach.

'Simple' also refers to use of a very simple data model instead of
a fully normalized schema or some object-relational mapping.

> The main point seems to be that the current
> implementation wants to prepare the used statements once during
> initialization rather than once per method call. This is somewhat in
> conflict with the J2EE best practice of keeping a database connection
> and related resources like prepared statements only for the duration
> of a single operation.

those best practices apply to j2ee applications. the point is that i don't
consider jackrabbit to be a j2ee application, jackrabbit is infrastructure
and has other requirements regarding its persistence layer than a
database application.

>
> Incidentally there happens to be one approach that would keep the
> performance advantages of the current approach, remove the conflict
> with J2EE practices, and even simplify the implementation! Like this:
>
> 1) Change the DatabasePersistenceManager to get a database connection
> and prepare the used statements per each operation to comply with J2EE
> practices.

note that write operations must occur within a single transaction, i.e.
you can't get a new connection for every write operation.

>
> 2) Use the Commons DBCP DriverAdapterCPDS DataSource implementation
> with PreparedStatement pooling in SimpleDbPersistenceManager to keep
> the performance gains.
>
> 3) Remove the now unneeded Connection and PreparedStatement members
> and resetStatement() method from the DatabasePersistenceManager class
> to simplify the implementation.
>
> The cost of this change would be a bit of pooling overhead per each
> persistence manager operation (should be insignificang compared to the
> cost of the database operations) and the introduction of commons-dbcp
> and commons-pool as dependencies.
>
> This change would also clarify that the responsibility of any extra
> database shutdown operations like in the current
> DerbyPersistenceManager rests on the subclass as the
> DatabasePersistenceManager class would no longer keep any stable
> reference to the underlying database.
>
> What do you think? I can take a shot at implementing this if you think
> it's worth doing.

-1 for changing SimpleDbPersistenceManager as suggested right now

on the other hand i have problem with adding a new more sophisticated
db pm as suggested.

time and experience will tell if we want to keep them both or not.

cheers
stefan

>
> BR,
>
> Jukka Zitting
>
> --
> Yukatan - http://yukatan.fi/ - info@yukatan.fi
> Software craftsmanship, JCR consulting, and Java development
>