You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Daniel Bloomfield Ramagem <da...@gmail.com> on 2007/01/24 23:33:52 UTC

Jackrabbit Performance Tuning - Large "Transaction" & Concurrent Access to Repository

I have successfully imported over 4000 files / 96 MB over a single
Jackrabbit session with no problem, using the default repository settings.

I have a question regarding performance: I observed that during this large
import (which is pretty intensive and takes at least a minute on a fast
machine) access to the same repository for other content, via another
separate session, seems to block.  Now, if I previously accessed this
content then it seems to be cached and access to it is instant.  Otherwise
if its newly accessed content then my access blocks.

Is there some sort of tuning of the repository settings that would improve
this concurrent access of the repository?  Or must I break down the amount
being imported into "chunks" so as not to create one large "commit" ( e.g.,
"session.save()")?  Do note that I require this atomic import behavior.

Thanks,

Daniel.

Re: Jackrabbit Performance Tuning - Large "Transaction" & Concurrent Access to Repository

Posted by Marcel Reutegger <ma...@gmx.net>.
Hi Daniel,

Daniel Bloomfield Ramagem wrote:
> For example, suppose I had previously created node A below.  I then begin
> importing B, C, D, ...  While that is happening a separate thread creates a
> new session and tries to get A.  It will get blocked until the import of
> tree B is finished storing in the repository.
> 
> Workspace
>       / \
>     A  B
>         / \
>        C ...

The following page has a nice little picture that shows the layering of the 
jackrabbit core:
http://jackrabbit.apache.org/doc/arch/operate/index.html

what the jira issue describes is a lock in the shared item state manager, which 
is workspace wide. when a change is committed this shared item state manager is 
locked and not even a read will go through it to retrieve an item from the 
underlying persistence manager. writes are also serialized, one at a time.

> However I have also observed that if A has been previously accessed (before
> the large store operation on B) then it will be available to my concurrent
> read thread because of caching (?).

yes, that's because of the item sate manager layered on top of the shared item 
state manager. those are per session and contain a cache. that's why certain 
sessions are still able to read A while a change by another session is 
committed. Those sessions have A in their cache.

> Does that seem like the behavior you'd expect from Jackrabbit?  I haven't
> done any strict testing and have been just informally testing these things.

yes, that's what you can expect from jackrabbit right now.

regards
  marcel

Re: Jackrabbit Performance Tuning - Large "Transaction" & Concurrent Access to Repository

Posted by Daniel Bloomfield Ramagem <da...@gmail.com>.
Hi Marcel,

Thanks for the link to the JIRA issue.  It says there that all store
operations will be serialized and that trying to read something that is
being stored will be blocked.  But I seem to be experiencing a block even
when I try to read something that was previously in the repository, before
the import.

For example, suppose I had previously created node A below.  I then begin
importing B, C, D, ...  While that is happening a separate thread creates a
new session and tries to get A.  It will get blocked until the import of
tree B is finished storing in the repository.

Workspace
       / \
     A  B
         / \
        C ...

However I have also observed that if A has been previously accessed (before
the large store operation on B) then it will be available to my concurrent
read thread because of caching (?).

Does that seem like the behavior you'd expect from Jackrabbit?  I haven't
done any strict testing and have been just informally testing these things.

Thanks,

Daniel.

On 1/25/07, Marcel Reutegger <ma...@gmx.net> wrote:
>
> Hi Daniel,
>
> this is a known issue / limitation with jackrabbit.
>
> See: http://issues.apache.org/jira/browse/JCR-314
>
> Daniel Bloomfield Ramagem wrote:
> > Is there some sort of tuning of the repository settings that would
> improve
> > this concurrent access of the repository?
>
> no, there is not.
>
> > Or must I break down the amount
> > being imported into "chunks" so as not to create one large "commit" (
> e.g.,
> > "session.save()")?  Do note that I require this atomic import behavior.
>
> well, you can break your import into several chunks and save them
> separately,
> but then the import will not be atomic :-/
>
> regards
>   marcel
>

Re: Jackrabbit Performance Tuning - Large "Transaction" & Concurrent Access to Repository

Posted by Marcel Reutegger <ma...@gmx.net>.
Hi Daniel,

this is a known issue / limitation with jackrabbit.

See: http://issues.apache.org/jira/browse/JCR-314

Daniel Bloomfield Ramagem wrote:
> Is there some sort of tuning of the repository settings that would improve
> this concurrent access of the repository?

no, there is not.

> Or must I break down the amount
> being imported into "chunks" so as not to create one large "commit" ( e.g.,
> "session.save()")?  Do note that I require this atomic import behavior.

well, you can break your import into several chunks and save them separately, 
but then the import will not be atomic :-/

regards
  marcel