You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Jukka Zitting <ju...@gmail.com> on 2007/08/21 10:47:11 UTC

Multiple connections (Was: Jackrabbit, the database)

Hi,

[Branching the thread]

On 8/21/07, Thomas Mueller <th...@gmail.com> wrote:
> My question is: should we support multiple connections in the
> persistence manager?
>
> If using multiple connections really does improve performance /
> scalability of Jackrabbit, I think we should think about it (long
> term, for NGP). Using multiple connections does not mean using
> databases. Multiple connections can be supported with file based
> persistence managers as well (or course the transaction isolation
> level needs to be defined).
>
> First we need proof that using multiple connections does help.
>
> Some feedback was along the line 'Jackrabbit must use DataSources /
> Pooled Connections because... it is better than using one
> Connection!'. I like to understand why and in what cases it is better.

The main argument I can see for multiple connections (or more
precisely a DataSource with a connection pool) within the database
persistence managers is to avoid the synchronization on the
PreparedStatements. Especially now that the locking in
SharedItemStateManager has become more fine-grained, there is a chance
that the synchronization within the persistence manager becomes a
bottleneck for certain access patterns.

Also, there is a valid argument that a statements from a single
Connection should not be used concurrently from multiple threads.

Furthermore, from a code clarity point of view I would personally
prefer a solution that didn't require explicit synchronization.

BR,

Jukka Zitting

Re: Multiple connections (Was: Jackrabbit, the database)

Posted by Padraic Hannon <pi...@wasabicowboy.com>.
Weblogic and oracle do something similair as well. Also, the PreparedStatement caches in those pools are configurable. Once I have the chance, hopefully this week, I'll get some testing done with the code I modified. Also, Derby has a pooled connection system as well. 

-pih


Thomas Mueller wrote:
> Hi,
>
>   
>> returns a prepared statement back to a pool (instead of really closing it)
>>     
>
> You are right, that's great!
>
> Then we should test if the PooledConnectionPersistenceManager really
> does improve the performance, and in what cases.
>
> Thomas
>   


Re: Multiple connections (Was: Jackrabbit, the database)

Posted by Thomas Mueller <th...@gmail.com>.
Hi,

> returns a prepared statement back to a pool (instead of really closing it)

You are right, that's great!

Then we should test if the PooledConnectionPersistenceManager really
does improve the performance, and in what cases.

Thomas

Re: Multiple connections (Was: Jackrabbit, the database)

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 8/21/07, Thomas Mueller <th...@gmail.com> wrote:
> > Any reasonable connection pool will pool also the prepared statements,
>
> I would not be so sure. Maybe if you don't close the prepared
> statement (not sure about the disadvantages of that). In any case it
> needs to be tested.

At least commons-dbcp (what I'm most familiar with) returns a prepared
statement back to a pool (instead of really closing it) when the
close() method is called.

BR,

Jukka Zitting

Re: Multiple connections (Was: Jackrabbit, the database)

Posted by Thomas Mueller <th...@gmail.com>.
>             PreparedStatement ps = connection.prepareStatement(...);
>             try {
>                 // use the prepared statement
>             } finally {
>                 ps.close();
>             }

> Any reasonable connection pool will pool also the prepared statements,

I would not be so sure. Maybe if you don't close the prepared
statement (not sure about the disadvantages of that). In any case it
needs to be tested.

Thomas

Re: Multiple connections (Was: Jackrabbit, the database)

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 8/21/07, Thomas Mueller <th...@gmail.com> wrote:
> Reusing prepared statements is harder with pooled connections. In JDK
> 1.6, thanks to javax.sql.StatementEventListener, they can if the
> connection pool manager supports it. In JDK 1.4 and 1.5, I am not sure
> what / when connection pool managers can reuse them.

If we did use a pooled DataSource, then I'd preferably drop all the
long-lived prepared statements that are currently kept as member
variables by the database persistence managers. Individual persistence
methods would become something like this:

    void foo() throws ... {
        Connection connnection = datasource.getConnection();
        try {
            PreparedStatement ps = connection.prepareStatement(...);
            try {
                // use the prepared statement
            } finally {
                ps.close();
            }
        } finally {
            connection.close();
        }
    }

Any reasonable connection pool will pool also the prepared statements,
so the above code would normally only require an extra hash lookup and
some bookkeeping overhead.

BR,

Jukka Zitting

Re: Multiple connections (Was: Jackrabbit, the database)

Posted by Thomas Mueller <th...@gmail.com>.
Hi,

> a larger binary
> A simple insert of 10MB will look up JR for few seconds, this is quite a
> problem IMO.

Sounds like multiple connections would help here... I didn't really
think about large binaries so far, because I thought they are usually
processed by the blob store (or global data store in the future).

> Another advantage is that jdbc pools can health check connections.

Maybe we should probably create a wiki page and put pros and cons there.

Reusing prepared statements is harder with pooled connections. In JDK
1.6, thanks to javax.sql.StatementEventListener, they can if the
connection pool manager supports it. In JDK 1.4 and 1.5, I am not sure
what / when connection pool managers can reuse them.

Thomas

RE: Multiple connections (Was: Jackrabbit, the database)

Posted by Michael Roberts <Mi...@webex.com>.
> A single DB connection can only process a single operation at a time.
> Jackrabbit locks up completely while storing (example: a larger
binary)
> - not only for reading but also for writing.
> There has been some improvements AFAIK, but it still applies for write
> operations.
> A simple insert of 10MB will look up JR for few seconds, this is quite
a
> problem IMO.

This was causing huge problems for us. A quick and dirty rewrite to
remove synchronization and start using connection pools in the PM has
almost completely fixed the problem for us. And though I have no hard
numbers to back this up, it also seems to have made Jackrabbit
"snappier".

~mike

Re: Multiple connections (Was: Jackrabbit, the database)

Posted by Marcel May <ma...@consol.de>.
Thomas Mueller wrote:
>> avoid the synchronization on the PreparedStatements
>>     
>
> I don't think that synchronization on prepared statements is a
> bottleneck. But you can prove that I am wrong. If writing the
> changelog is synchronized (not sure if it is), that would be a
> bottleneck.
>
> Thomas
>   
The 'synchronization' can also be described as serialization of all DB
acccess.
A single DB connection can only process a single operation at a time.
Jackrabbit locks up completely while storing (example: a larger binary)
- not only for reading but also for writing.
There has been some improvements AFAIK, but it still applies for write
operations.
A simple insert of 10MB will look up JR for few seconds, this is quite a
problem IMO.

Another advantage is that jdbc pools can health check connections.
This reduces the complex firewall- or database- 'closed connection
reconnect' logic ... if not making it redunant.
This improves stability.

Also nice: pools usually are often monitorable, most app servers support
this.

Cheers,
Marcel

Re: Multiple connections (Was: Jackrabbit, the database)

Posted by Thomas Mueller <th...@gmail.com>.
> avoid the synchronization on the PreparedStatements

I don't think that synchronization on prepared statements is a
bottleneck. But you can prove that I am wrong. If writing the
changelog is synchronized (not sure if it is), that would be a
bottleneck.

Thomas