You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by majohnst <ma...@lattaoutdoors.com> on 2009/04/28 15:47:29 UTC

Database Connections and Performance

I am trying to better understand how Jackrabbit performs queries on a mysql
database filestore/persistance manager. I understand that when Jackrabbit
starts, it creates several persistent connections to the database. One for
the datastore, one for the workspace, one for the journal (for clustering),
etc. Jackrabbit cannot use connection pools, so there is no need to setup a
database connection pool.

My question is, when I query jackrabbit (or other jcr operation that needs
to read from the mysql database), is there really only one connection to my
mysql database? Do all queries go into a queue and only one query is
executed at a time? So there is no idea of simultaneous querying with
jackrabbit? Is there any way to increase the number of queries jackrabbit
can do at one time?

Our application has a very high number of concurrent users and we are seeing
a slow down because jackrabbit cannot execute the queries fast enough. I am
hoping that there is a way I can increase the number of queries that
jackrabbit can execute at one time.
-- 
View this message in context: http://www.nabble.com/Database-Connections-and-Performance-tp23277508p23277508.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Re: Database Connections and Performance

Posted by Alessandro Bologna <al...@gmail.com>.

Hi guys,
just would like to clarify that Jackrabbit does *not* uses the databases for
queries, even if you use the sql-like syntax.

Queries are executed by the JVM against the Lucene indexes (which are stored
on your local disk). The result of these queries is (maybe I am simplifying
a bit) a sequence of uuid that then are used to retrieve the corresponding
serialized node data from the DB. The only type of "query" you would see on
the database is to retrieve the corresponding nodes (something along the
lines of 'select * from table where uuid='...')

So, the number of concurrent connections to the database has very little to
do with how fast your queries will execute.

If you experience a slowdown with many users, and you are sure that the
issue is in the JCR queries, you will benefit from more memory on you app
server, faster cpus(s) and potentially a faster local disk. If none of these
help (or is not feasible for you), try to revisit your queries. In some
cases, you may be able to do without them, or simplify them. Read this post
(
http://www.nabble.com/Explanation-and-solutions-of-some-Jackrabbit-queries-regarding-performance-td15028655.html)
to
see if it helps to optimize them a bit.

Alessandro





On Tue, Apr 28, 2009 at 11:06 AM, Ian Boston <ie...@tfd.co.uk> wrote:

> Ok I should state that I haven't used OCM, but from looking at it I think
> it binds to javax.jcr.*
> if not, ignore my comments (but please let me know so I learn :) )
>
> On 28 Apr 2009, at 15:44, majohnst wrote:
>
>
>> Thanks for the response. To give a little more information about my
>> situation, we are not seeing excess sql traffic, we are more concerned
>> with
>> the time required to execute a query from the repository.
>>
>> In our setup, we are using a Spring/Tomcat setup and using Jackrabbit OCM
>> to
>> map our entities. We have noticed that as the number of concurrent users
>> on
>> our website increases, the query performance goes way down. So our page
>> load
>> times increase dramatically. As best as I can tell, the pages are waiting
>> for a jackrabbit query to execute and a backlog of jackrabbit operations
>> begins to form, slowing down the page loads. When you say jackrabbit is
>> multi-threaded:
>>
>>
>>
>> Ian Boston wrote:
>>
>>>
>>> Jackrabbit has its own multi threaded state management. Everything is
>>> focused on serving information from memory and not performing a query
>>> against the database. Only when a session needs to get something that
>>> isn't in one of the shared caches will you see queries hitting the
>>> database.
>>>
>>>
>> Do you mean that it is executing queries against the repository in a
>> multi-threaded way (many concurrent queries)?
>>
>
> I think we need to differentiate queries.
>
> Queries above jackrabbit at the javax.jcr level are either direct
> (javax.jcr.Session.getNode() etc), indirect Node.getNode() or
> javax.jcr.query.*. All of these go through the jcr Session which is not
> thread safe but there will be 1000's and they will be able to access shared
> node state *without* generating SQL Queries.
>
> So there can and will be many javax.jcr.* 'queries' all multi threaded.
>
> AFAIK, java.sql.Connections are not generally thread safe (driver
> dependent) so SQL queries will be serialized even if multiple threads are
> requesting them.
>
>
>
>  Since we are using spring in a
>> web app, is this considered one session, or multiple sessions?
>>
>
> If spring uses a single java.jcr.Session then it will be one. I would
> expect it to use one per request thread.
>
>
>  The ultimate
>> goal is to increase the speed of our queries. We have already read over
>> the
>> tips regarding how to write queries effectively and how to structure the
>> repository. Now I am thinking we are running into code issues either with
>> the Jackrabbit query logic or in OCM mapping that is slowing the process
>> down.
>>
>
> I think you need an OCM expert (not me sorry) as it could be that there is
> something going on there.
>
> If you want to check that Jackrabbit itself is the problem, rather than
> OCM, try using apache benchmark against a JCR repository.
>
> eg
> ab -c200 -n1000
> http://localhost:8080/some/url/that/gets/the/content/of/a/jcr/file.html
>
> ie 200 concurrent threads, each thread requesting 1000 times.
>
> Ian
>
>
>
>> --
>> View this message in context:
>> http://www.nabble.com/Database-Connections-and-Performance-tp23277508p23278564.html
>> Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
>>
>>
>

Re: Database Connections and Performance

Posted by Ian Boston <ie...@tfd.co.uk>.

Ok I should state that I haven't used OCM, but from looking at it I  
think it binds to javax.jcr.*
if not, ignore my comments (but please let me know so I learn :) )

On 28 Apr 2009, at 15:44, majohnst wrote:

>
> Thanks for the response. To give a little more information about my
> situation, we are not seeing excess sql traffic, we are more  
> concerned with
> the time required to execute a query from the repository.
>
> In our setup, we are using a Spring/Tomcat setup and using  
> Jackrabbit OCM to
> map our entities. We have noticed that as the number of concurrent  
> users on
> our website increases, the query performance goes way down. So our  
> page load
> times increase dramatically. As best as I can tell, the pages are  
> waiting
> for a jackrabbit query to execute and a backlog of jackrabbit  
> operations
> begins to form, slowing down the page loads. When you say jackrabbit  
> is
> multi-threaded:
>
>
>
> Ian Boston wrote:
>>
>> Jackrabbit has its own multi threaded state management. Everything is
>> focused on serving information from memory and not performing a query
>> against the database. Only when a session needs to get something that
>> isn't in one of the shared caches will you see queries hitting the
>> database.
>>
>
> Do you mean that it is executing queries against the repository in a
> multi-threaded way (many concurrent queries)?

I think we need to differentiate queries.

Queries above jackrabbit at the javax.jcr level are either direct  
(javax.jcr.Session.getNode() etc), indirect Node.getNode() or  
javax.jcr.query.*. All of these go through the jcr Session which is  
not thread safe but there will be 1000's and they will be able to  
access shared node state *without* generating SQL Queries.

So there can and will be many javax.jcr.* 'queries' all multi threaded.

AFAIK, java.sql.Connections are not generally thread safe (driver  
dependent) so SQL queries will be serialized even if multiple threads  
are requesting them.

> Since we are using spring in a
> web app, is this considered one session, or multiple sessions?

If spring uses a single java.jcr.Session then it will be one. I would  
expect it to use one per request thread.

> The ultimate
> goal is to increase the speed of our queries. We have already read  
> over the
> tips regarding how to write queries effectively and how to structure  
> the
> repository. Now I am thinking we are running into code issues either  
> with
> the Jackrabbit query logic or in OCM mapping that is slowing the  
> process
> down.

I think you need an OCM expert (not me sorry) as it could be that  
there is something going on there.

If you want to check that Jackrabbit itself is the problem, rather  
than OCM, try using apache benchmark against a JCR repository.

eg
ab -c200 -n1000 http://localhost:8080/some/url/that/gets/the/content/of/a/jcr/file.html

ie 200 concurrent threads, each thread requesting 1000 times.

Ian

>
> -- 
> View this message in context: http://www.nabble.com/Database-Connections-and-Performance-tp23277508p23278564.html
> Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
>

Re: Database Connections and Performance

Posted by majohnst <ma...@lattaoutdoors.com>.

Thanks for the response. To give a little more information about my
situation, we are not seeing excess sql traffic, we are more concerned with
the time required to execute a query from the repository.

In our setup, we are using a Spring/Tomcat setup and using Jackrabbit OCM to
map our entities. We have noticed that as the number of concurrent users on
our website increases, the query performance goes way down. So our page load
times increase dramatically. As best as I can tell, the pages are waiting
for a jackrabbit query to execute and a backlog of jackrabbit operations
begins to form, slowing down the page loads. When you say jackrabbit is
multi-threaded:

Ian Boston wrote:
> 
> Jackrabbit has its own multi threaded state management. Everything is  
> focused on serving information from memory and not performing a query  
> against the database. Only when a session needs to get something that  
> isn't in one of the shared caches will you see queries hitting the  
> database.
> 

Do you mean that it is executing queries against the repository in a
multi-threaded way (many concurrent queries)? Since we are using spring in a
web app, is this considered one session, or multiple sessions? The ultimate
goal is to increase the speed of our queries. We have already read over the
tips regarding how to write queries effectively and how to structure the
repository. Now I am thinking we are running into code issues either with
the Jackrabbit query logic or in OCM mapping that is slowing the process
down.
-- 
View this message in context: http://www.nabble.com/Database-Connections-and-Performance-tp23277508p23278564.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Re: Database Connections and Performance

Posted by Ian Boston <ie...@tfd.co.uk>.

Jackrabbit has its own multi threaded state management. Everything is  
focused on serving information from memory and not performing a query  
against the database. Only when a session needs to get something that  
isn't in one of the shared caches will you see queries hitting the  
database.

On the persistent manager connection you will probably see a stream of  
insert and update statements (perhaps with some selects), but you will  
not see the same volume of selects that you saw with a normal RDBMS  
application since most activity will be services from shared cache.

The one connection where you will see activity is the Journal  
connection that last time I looked in 1.4 needs to serialize the  
journal creation so that all nodes can perform replays in step. Hence  
there will be a select to generate a new sequence, and an insert to  
save the sequence.

The side effect of all of this is that jackrabbit can support a higher  
level of concurrency in the web application layer than you would be  
able to support if each request thread required a database connection.  
The JCR sessions are lighter weight than JDBC sessions simply because  
they don't require a network connection. (although pooling removes  
that need)

However, if you are seeing excessive SQL traffic, and would like to  
reduce it, using an XATransaction manager (eg from JTA) will batch up  
the SQL traffic to when the transaction is committed and the transient  
state is persisted from the session state.

To make jackrabbit use one connection per thread, you would have to  
refactor quite a lot of code above the persistence manager. I suspect  
writing a new Jackrabbit SPI implementation would do this, but thats  
certainly not trivial.

HTH
Ian

On 28 Apr 2009, at 14:47, majohnst wrote:

>
> I am trying to better understand how Jackrabbit performs queries on  
> a mysql
> database filestore/persistance manager. I understand that when  
> Jackrabbit
> starts, it creates several persistent connections to the database.  
> One for
> the datastore, one for the workspace, one for the journal (for  
> clustering),
> etc. Jackrabbit cannot use connection pools, so there is no need to  
> setup a
> database connection pool.
>
> My question is, when I query jackrabbit (or other jcr operation that  
> needs
> to read from the mysql database), is there really only one  
> connection to my
> mysql database? Do all queries go into a queue and only one query is
> executed at a time? So there is no idea of simultaneous querying with
> jackrabbit? Is there any way to increase the number of queries  
> jackrabbit
> can do at one time?
>
> Our application has a very high number of concurrent users and we  
> are seeing
> a slow down because jackrabbit cannot execute the queries fast  
> enough. I am
> hoping that there is a way I can increase the number of queries that
> jackrabbit can execute at one time.
> -- 
> View this message in context: http://www.nabble.com/Database-Connections-and-Performance-tp23277508p23277508.html
> Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
>