You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Jukka Zitting <ju...@gmail.com> on 2007/12/20 08:39:51 UTC

Database connection pooling in Jackrabbit

Hi,

I'm again raising the old issue (see for example [1] and [2]) of using
a connection pool in the database persistence managers. Previously
those arguments have been countered but the issue is becoming more
pressing with the fine grained locking feature and recently I think
we've been moving towards inventing our own DataSource abstraction
with the ConnectionRecoveryManager class. I'd like to take these two
developments together and to again suggest that we switch to using
connection pools in Jackrabbit.

The main benefits from using a connection pool would be:

1) Removing the database connection bottleneck. With fine grained
locking and even before that with large binaries in the database we
are facing the problem that the single database connection used by a
persistence manager becomes a bottleneck for serving multiple
concurrent clients. Currently a single client that performs a large
import or retrieves a large binary will block all uncached access to
an entire workspace. This is a major performance and usability issue.

2) Simplified connection recovery code. Most connection pools will
automatically discard broken connections, removing the need for
explicit reconnect logic in Jackrabbit. There's lots of complex
"trials--" code in Jackrabbit that we could just drop if we used a
connection pool.

3) Simplified prepared statement caching. Many connection pools are
able to automatically cache prepared statements, removing the need for
explicit statement caching in Jackrabbit.

4) Familiar connection model. Many potential contributors are quite
familiar with the J2EE best practices for using database connections,
so adopting them in Jackrabbit would lower the barrier of entry for
new contributions. The familiar connection model will also make many
database administrators (who're typically the ones who need to manage
Jackrabbit instances in production) happier.

WDYT?

[1] http://markmail.org/message/oz376dss2whtmrww
[2] http://markmail.org/message/ai72otqjqmgapfyp

BR,

Jukka Zitting

Re: Database connection pooling in Jackrabbit

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Dec 20, 2007 3:40 PM, Stefan Guggisberg <st...@gmail.com> wrote:
> here's a couple random thoughts/comments:
>
> - J2EE infrastructure/environment should not be mandated;
>   it should still be possible to use jackrabbit (with jdbc based
>   persistence) standalone.

+1 I think commons-dbcp will help us here. The following code snippet
will buy you a connection pool with our existing configuration
information:

    DriverAdapterCPDS cpds = new DriverAdapterCPDS();
    cpds.setDriver(...);
    cpds.setUrl(...);
    cpds.setUser(...);
    cpds.setPassword(...);

    SharedPoolDataSource datasource = new SharedPoolDataSource();
    datasource.setConnectionPoolDataSource(cpds);

> - we'll have to make sure that connections from an external pool
>   are not enlisted in external/distributed transactions.

True. This will likely require quite a bit of documenting. Is there
perhaps some way we could ask a DataSource or a Connection whether it
participates in a distributed transaction?

> - we'll have to make sure that connections from an external pool
>   have the correct tx isolation level (read committed)

We might want to add a Connection.getTransactionIsolation() check to
be performed at least during persistence manager initialization.

> - all write operations associated with a specific changelog
>   must be done through the *same* connection.

Yes. That'll probably require most refactoring of existing work, as
AbstractBundlePersistenceManager.store() currently divides the work to
separate methods with no shared state apart from the persistence
manager instance itself.

BR,

Jukka Zitting

Re: Database connection pooling in Jackrabbit

Posted by Stefan Guggisberg <st...@gmail.com>.
On Dec 20, 2007 8:39 AM, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> I'm again raising the old issue (see for example [1] and [2]) of using
> a connection pool in the database persistence managers. Previously
> those arguments have been countered but the issue is becoming more
> pressing with the fine grained locking feature and recently I think
> we've been moving towards inventing our own DataSource abstraction
> with the ConnectionRecoveryManager class. I'd like to take these two
> developments together and to again suggest that we switch to using
> connection pools in Jackrabbit.
>
> The main benefits from using a connection pool would be:
>
> 1) Removing the database connection bottleneck. With fine grained
> locking and even before that with large binaries in the database we
> are facing the problem that the single database connection used by a
> persistence manager becomes a bottleneck for serving multiple
> concurrent clients. Currently a single client that performs a large
> import or retrieves a large binary will block all uncached access to
> an entire workspace. This is a major performance and usability issue.
>
> 2) Simplified connection recovery code. Most connection pools will
> automatically discard broken connections, removing the need for
> explicit reconnect logic in Jackrabbit. There's lots of complex
> "trials--" code in Jackrabbit that we could just drop if we used a
> connection pool.
>
> 3) Simplified prepared statement caching. Many connection pools are
> able to automatically cache prepared statements, removing the need for
> explicit statement caching in Jackrabbit.
>
> 4) Familiar connection model. Many potential contributors are quite
> familiar with the J2EE best practices for using database connections,
> so adopting them in Jackrabbit would lower the barrier of entry for
> new contributions. The familiar connection model will also make many
> database administrators (who're typically the ones who need to manage
> Jackrabbit instances in production) happier.
>
> WDYT?

i agree that in the light of recent development (e.g. JCR-314) it is
now time to
revisit the jdbc connection pooling debate.  i generally agree that now, using
more fine grained locking strategies in jackrabbit core, pooled connections
could potentially improve read concurrency/performance (i'm not sure whether
that's also true for write concurrency...).

here's a couple random thoughts/comments:

- J2EE infrastructure/environment should not be mandated;
  it should still be possible to use jackrabbit (with jdbc based
  persistence) standalone.
- we'll have to make sure that connections from an external pool
  are not enlisted in external/distributed transactions.
- we'll have to make sure that connections from an external pool
  have the correct tx isolation level (read committed)
- all write operations associated with a specific changelog
  must be done through the *same* connection.

cheers
stefan


>
> [1] http://markmail.org/message/oz376dss2whtmrww
> [2] http://markmail.org/message/ai72otqjqmgapfyp
>
> BR,
>
> Jukka Zitting
>