You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Francisco Carriedo Scher <fc...@gmail.com> on 2012/06/23 21:28:29 UTC

Performance issues

Hi there!

I have some questions related to a performance setup. Basically i would
like to disable everything that is not needed to get as high I/O throughput
as possible (writing nodes through a webdav session, getting them through
HTTP). For example searches and versioning are not needed by now, can the
repository be configured to run without them? About indexing i guess that
Jackrabbit internals require it to work properly, otherwise i would like to
disable it too.

I saw this (http://www.slideshare.net/jukka/repository-performance-tuning)
slides from Jukka Zitting and it contains info about caching configuration
that i understand only partially:

*Relevant caches:
*

   - *Path to ID map (internal structure, not configurable) =>  *this cache
   relates the URL (and later a path) to a node, right? In my case this will
   be renewed agressively in time.*

   *
   - *Item state caches (automatically balanced, configurable for special
   cases) *=> The only relevant state is that of the folder nodes, file
   nodes won't be updated, just saved, so avoid keeping file states would be
   great.*

   *
   - *Bundle cache (default fairly low, increase for large deployments) *=>
   Large enough to keep in cache all the files that are to be referenced soon
   (100 - 300 files)*

   *
   - *Also some PM-specific options (TarPM index, etc.)* => Some pointer to
   look at to have an idea?


Some of the info i gave / asked for need deeper explanation, if you miss
that keep reading:

The traffic pattern the repository will receive is:

   - lots of (100 - 300 per second) concurrent writes (each through its own
   session)
   - lots of (100 - 300 per second) concurrent one-time reads (the files
   stored are requested just one time and soon in time, ranging from
   inmediately to 1 / 2 minutes)
   - file size of (10KB - 100KB)
   - long execution periods (2 - 3 days)


So far I did some changes in the JBoss 7 configuration:

   - 120KB of memory per thread (requests comming through webdav, about 200
   HTTP threads are created, the less memory per thread, the higher number of
   threads can be created, is this correct?)
   - 1GB fixed heap size
   - 200 MB permgen space


Regarding Jackrabbit my setup works with:

   - Datastore on disk
   - Default DerbyDB persistence manager


Finally, as the amount of files to store is pretty large when working at
the mentioned rate, i replicate the behaviour of the DataStore building a
hash-based folder structure as i save the files (the hash of the file is
available, it can be used "for free"), to avoid performance issues when
having > 1000 files under the same folder. This way the folder/file
hierarchy does not result flat and it seems to work as the read / write
times seem to keep stable.


Thank you very much for your time!

Regards.