You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Francisco Carriedo Scher <fc...@gmail.com> on 2012/06/23 21:28:29 UTC
Performance issues
Hi there!
I have some questions related to a performance setup. Basically i would
like to disable everything that is not needed to get as high I/O throughput
as possible (writing nodes through a webdav session, getting them through
HTTP). For example searches and versioning are not needed by now, can the
repository be configured to run without them? About indexing i guess that
Jackrabbit internals require it to work properly, otherwise i would like to
disable it too.
I saw this (http://www.slideshare.net/jukka/repository-performance-tuning)
slides from Jukka Zitting and it contains info about caching configuration
that i understand only partially:
*Relevant caches:
*
- *Path to ID map (internal structure, not configurable) => *this cache
relates the URL (and later a path) to a node, right? In my case this will
be renewed agressively in time.*
*
- *Item state caches (automatically balanced, configurable for special
cases) *=> The only relevant state is that of the folder nodes, file
nodes won't be updated, just saved, so avoid keeping file states would be
great.*
*
- *Bundle cache (default fairly low, increase for large deployments) *=>
Large enough to keep in cache all the files that are to be referenced soon
(100 - 300 files)*
*
- *Also some PM-specific options (TarPM index, etc.)* => Some pointer to
look at to have an idea?
Some of the info i gave / asked for need deeper explanation, if you miss
that keep reading:
The traffic pattern the repository will receive is:
- lots of (100 - 300 per second) concurrent writes (each through its own
session)
- lots of (100 - 300 per second) concurrent one-time reads (the files
stored are requested just one time and soon in time, ranging from
inmediately to 1 / 2 minutes)
- file size of (10KB - 100KB)
- long execution periods (2 - 3 days)
So far I did some changes in the JBoss 7 configuration:
- 120KB of memory per thread (requests comming through webdav, about 200
HTTP threads are created, the less memory per thread, the higher number of
threads can be created, is this correct?)
- 1GB fixed heap size
- 200 MB permgen space
Regarding Jackrabbit my setup works with:
- Datastore on disk
- Default DerbyDB persistence manager
Finally, as the amount of files to store is pretty large when working at
the mentioned rate, i replicate the behaviour of the DataStore building a
hash-based folder structure as i save the files (the hash of the file is
available, it can be used "for free"), to avoid performance issues when
having > 1000 files under the same folder. This way the folder/file
hierarchy does not result flat and it seems to work as the read / write
times seem to keep stable.
Thank you very much for your time!
Regards.