You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by "Seidel. Robert" <Ro...@aeb.de> on 2013/06/26 19:52:26 UTC

Write performance problems

Hi,

I'm using jackrabbit 2.4.2 and facing performance problems. I know write performance is a huge disadvantage of jackrabbit, cause it has to be done all single threaded.

The situation is that I want to migrate data from some old software and put it into a jackrabbit repository with a bundled database manager (clustered environment, versionable nodetype). The process is to get some (maybe 500) old data sets and then create nodes for them and save the session, then repeat with the next data block.

The session.save() is slowing all the process down, especially the database operations. I've made a trace of the sql statements and here is what I got:

The following steps are repeated very often (I guess for each new node):

-       Selects at versioning bundle

-       Update at global revision (committed with the journal insert)

-       Inserts and Updates at versioning bundle (committed)

-       Selects at workspace bundle

-       Insert at journal (committed)

-       Update at local revisions (auto-committed)

After this, all of the workspace bundles are saved at once, in one database transaction.

Two points:


1.    Versioning

Most of my nodes exist in just one version (about 90%), but because some of them are versioned, I need the versionable nodetype. But for all the others a version history is created, consuming database space and write performance. Why can't the version history not be created if a node is checked in? This would save space and time, if a node is versionable but not actually versioned. Or is there a solution for a situation like this?

I've also done some testing with multiple sessions and multithreading. In the result all but one thread was waiting for the exclusive read/write lock of the version manager - so no multithreading possible, as expected.

2.    Operations for each node
The write performance can be fastened by 4 or 5 times, if the operations are more bundled in transactions like the inserting/updating of the workspace bundles, reducing the commits to a minimum. Storing the workspace bundles takes nearly the same time as storing the versioning information for one node (one cycle). The updates of global revision and local revisions can be done once and not once per changed node reducing the necessary time to a minimum.

I'm going to solve my performance problems now with multiple repositories and data splitting...

Regards, Robert

Mit freundlichen Grüßen

i. A. Robert Seidel, Software Infrastructure, Senior Professional
--
AEB GmbH
D-23552 Lübeck, Kanalstraße 62-64
Tel. +49-451-2928938-130
Fax +49-451-2928938-333
robert.seidel@aeb.de<ma...@aeb.de>
www.aeb.de<http://www.aeb.de>
---
AEB Gesellschaft zur Entwicklung von Branchen-Software mbH
Stammsitz Stuttgart
Registergericht: Amtsgericht Stuttgart, HRB 84 31
Gerichtsstand Stuttgart
Geschäftsführer: Jochen Günzel, Markus Meißner


RE: Write performance problems

Posted by Marcel Reutegger <mr...@adobe.com>.
Hi,

> Most of my nodes exist in just one version (about 90%), but because some of
> them are versioned, I need the versionable nodetype. But for all the others a
> version history is created, consuming database space and write performance.
> Why can't the version history not be created if a node is checked in?

this is mandated by the JCR specification. as soon as a node is versionable
the repository has to create a version history for the node.

> This would save space and time, if a node is versionable but not actually
> versioned. Or is there a solution for a situation like this?

yes, there is. instead of baking the versionable node type directly into
your node type hierarchy you can assign mix:versionable on demand. IIUC
most of your nodes wouldn't be versionable initially, but only assigned
the mix:versionable when you need to do a first checkin.

regards
 marcel