You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "kazim_ssuet@yahoo.com" <ka...@yahoo.com> on 2010/11/09 20:39:35 UTC

Jackrabbit performance in multi threaded environment

While testing jackrabbit performance, I am running multiple thread.
Few threads are loading documents, few are fetching them and few are
updating properties on nodes.

Initially I was loggin into repository with adminId/admin for all requests
and there was deadlock for all threads so nothing was moving.
Then I changed the login so only threads that need to modify properties or
load documents are logging in as admin and rest are logging as anonimous
(read only mode), things started moving but still performance decreases alot
when threads count is increased. Even only 10 threads of each operation
(load, fetch, modify) take alot of toll on performance.

Please shed some light on concurrent access to repository and suggest any
solutions. Or point me to some link that talks about this issue.

Let me know if you need more information.

Thanks,
KS
-- 
View this message in context: http://jackrabbit.510166.n4.nabble.com/Jackrabbit-performance-in-multi-threaded-environment-tp3034980p3034980.html
Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.

RE: Jackrabbit performance in multi threaded environment

Posted by "kazim_ssuet@yahoo.com" <ka...@yahoo.com>.
Search was taking too long with a where clause on date field.
We have startDate and endDate properties associated to our custom file node
(an extension of nt:file). When document is detached, we set the endDate on
the custom file node.

Even though startDate and endDate were indexed, the search was freezing
everything while multiple threads doing several types of operations
simultaniously. Maybe search based on date fields is not a good idea.

My select does not have where clause on dates anymore, I search nodes
without date filter and then filter them in my code.

Thanks,
KS.
-- 
View this message in context: http://jackrabbit.510166.n4.nabble.com/Jackrabbit-performance-in-multi-threaded-environment-tp3034980p3050949.html
Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.

RE: Jackrabbit performance in multi threaded environment

Posted by Jukka Zitting <jz...@adobe.com>.
Hi,

From: kazim_ssuet@yahoo.com [mailto:kazim_ssuet@yahoo.com]
> Initially I was loggin into repository with adminId/admin for
> all requests and there was deadlock for all threads so nothing
> was moving.

Can you reproduce the deadlock? Please file a bug report about that and attach a thread dump that shows the deadlock.

The most likely cause of deadlocks in such scenarios is if you're using the same session concurrently from multiple threads (we've added protection against that in 2.2), but if I understood correctly this is not the case here.

> Then I changed the login so only threads that need to modify
> properties or load documents are logging in as admin and rest
> are logging as anonimous (read only mode), things started moving
> but still performance decreases alot when threads count is
> increased. Even only 10 threads of each operation (load, fetch,
> modify) take alot of toll on performance.

The Jackrabbit 2.1 (and older) architecture is optimized for the case where most concurrent reads access already cached content. Write operations and reads that access the underlying hard disk end up blocking all other repository access, which explains the performance loss you're seeing.

We've made some significant improvements for concurrent read operations in Jackrabbit 2.2 (most notably a cache miss will no longer block other readers), but write operations still require exclusive access. The optional FineGrainedISMLocking strategy can be used to avoid this exclusive lock when dealing with non-overlapping parts of the content tree.

BR,

Jukka Zitting

Re: Jackrabbit performance in multi threaded environment

Posted by Raffaele Sena <ra...@gmail.com>.
If you are really curious you can take a thread dump of your application or
your Jackrabbit server ( kill -QUIT <process> ) and see where they are
waiting.

I had a similar problem. Not sure about the real cause but my guess is that
even (and especially) if you have different sessions, if the "save" requests
are running in parallel Jackrabbit needs to check and make sure that they
are not operating on the same nodes (and depending on the operation the same
parents).

As I said I had a similar problem but in my situation I only had to rewrite
different nodes with data cached in memory so I "solved" it by adding a
worker thread, adding the request to a queue and freeing my main threads.

Re: Jackrabbit performance in multi threaded environment

Posted by "kazim_ssuet@yahoo.com" <ka...@yahoo.com>.
Also can someone explain why all threads were blocking when there was no
synchronization block around session.save(), while all threads are creating
and destroying session per repository call. If each call is being made using
a new session why do I need to synchronize session.save(). It seems like
sessions are interacting.
-- 
View this message in context: http://jackrabbit.510166.n4.nabble.com/Jackrabbit-performance-in-multi-threaded-environment-tp3034980p3036233.html
Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.

Re: Jackrabbit performance in multi threaded environment

Posted by Ian Boston <ie...@tfd.co.uk>.
Hi,
Try upgrading to 2.1.2
There was a lot of work done to remove synchronization, however IIRC, reads that need to load items that are not in the shared cache block other reads that need to do the same, and writes block all reads of that form.  (see the non exclusive read locks/ and write locks in the SharedItemManager)

Also, depending on what you ACLs do (if you have any denies) you might find that there is a system session underlying the standard AccessControlManager that is shared by all sessions, and blocks all other sessions..... although this might have been one of the things removed in 2.1.2.

I have found node creation that also creates version history to be particularly expensive in a multithreaded environment, leading to creating version histories (adding mix:versionable) only when the first version is saved. 

As I say, most of my observations are for 2.1.1 and may have been fixed in 2.1.2.
Ian
On 10 Nov 2010, at 14:44, kazim_ssuet@yahoo.com wrote:

> 
> Any ideas/suggestions?
> 
> I am using Jackrabbit 2.1.1 WAR deployed on websphere and API calls are
> being made through RMI. Contents are stored in Oracle database.
> 
> My threads are not sharing sessions, each thread makes a call to to
> load/fetch/modify function and session is created and logged out in those
> functions.
> 
> I am not using any explicit locking (except that I had to put session.save()
> in synchronized block amoung all threads, since all threads were blocked
> without that synchronized block, as mentioned earlier).
> 
> Please help.
> 
> Thanks.
> KS
> -- 
> View this message in context: http://jackrabbit.510166.n4.nabble.com/Jackrabbit-performance-in-multi-threaded-environment-tp3034980p3036201.html
> Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.


Re: Jackrabbit performance in multi threaded environment

Posted by "kazim_ssuet@yahoo.com" <ka...@yahoo.com>.
Any ideas/suggestions?

I am using Jackrabbit 2.1.1 WAR deployed on websphere and API calls are
being made through RMI. Contents are stored in Oracle database.

My threads are not sharing sessions, each thread makes a call to to
load/fetch/modify function and session is created and logged out in those
functions.

I am not using any explicit locking (except that I had to put session.save()
in synchronized block amoung all threads, since all threads were blocked
without that synchronized block, as mentioned earlier).

Please help.

Thanks.
KS
-- 
View this message in context: http://jackrabbit.510166.n4.nabble.com/Jackrabbit-performance-in-multi-threaded-environment-tp3034980p3036201.html
Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.

Re: Jackrabbit performance in multi threaded environment

Posted by "kazim_ssuet@yahoo.com" <ka...@yahoo.com>.
Correction: threads started moving not because read only thread now login as
anonynous, but because I've put session.save() in synchronized block, which
is synchronized among all threads...Don't know why i had to do that. I would
suppose this synchronization should be implicit. 
-- 
View this message in context: http://jackrabbit.510166.n4.nabble.com/Jackrabbit-performance-in-multi-threaded-environment-tp3034980p3035132.html
Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.