You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Dietmar Gräbner <d....@berlinger.cc> on 2008/01/10 09:01:03 UTC

performance: handling large files

Hi,

we plan to use jackrabbit as repository for a new application for which 
efficient file handling is a central requirement.

Currently we use these components:
-jackrabbit 1.3
-OraclePersistenceManager with Oracle10
-JBoss 4.2 (access jcr using jndi)

The file size may vary from 1 - 200 MB. The main operations are create, 
copy, move and read (of nt:file nodes).
I run a test with a 120MB file:
- create, copy: 40seconds
- move: 0,05s
- read: 30s

Do you see any possibilities in modifying the setup to speed up the 
operations?

- DataStore: may speed up the copy operation?
- BundledPersistenceManager: I assume that it has no positive effect on 
handling large files. (rather the query performance on large trees)


thx and best regards


Dietmar


Re: performance: handling large files

Posted by Thomas Mueller <th...@gmail.com>.
Hi,

The advantage of using embedded Java databases in Java application is:
The data does not need to be sent via TCP/IP, and does not need to be
converted that much. I can't say much about MySQL - in my tests MySQL
was quite fast compared to Derby. But sure there are faster Java
databases available than Derby.

Regards,
Thomas


On Jan 11, 2008 8:27 AM, Dietmar Gräbner <d....@berlinger.cc> wrote:
> first - thx for all comments
> ...
> >> Do you see any possibilities in modifying the setup to speed up the
> >> operations?
> >>
> >
> > yes, don't use oracle ;) embedded derby or even mysqk should provide a much
> > better performance
> What's the reason for Oracle performing worse than those mentioned above?
>
>
>
>
>

Re: performance: handling large files

Posted by Dietmar Gräbner <d....@berlinger.cc>.
first - thx for all comments
...
>> Do you see any possibilities in modifying the setup to speed up the
>> operations?
>>     
>
> yes, don't use oracle ;) embedded derby or even mysqk should provide a much
> better performance
What's the reason for Oracle performing worse than those mentioned above?





Re: performance: handling large files

Posted by Stefan Guggisberg <st...@gmail.com>.
On Jan 10, 2008 9:01 AM, Dietmar Gräbner <d....@berlinger.cc> wrote:
> Hi,
>
> we plan to use jackrabbit as repository for a new application for which
> efficient file handling is a central requirement.
>
> Currently we use these components:
> -jackrabbit 1.3
> -OraclePersistenceManager with Oracle10
> -JBoss 4.2 (access jcr using jndi)
>
> The file size may vary from 1 - 200 MB. The main operations are create,
> copy, move and read (of nt:file nodes).
> I run a test with a 120MB file:
> - create, copy: 40seconds
> - move: 0,05s
> - read: 30s
>
> Do you see any possibilities in modifying the setup to speed up the
> operations?

yes, don't use oracle ;) embedded derby or even mysqk should provide a much
better performance.

cheers
stefan

>
> - DataStore: may speed up the copy operation?
> - BundledPersistenceManager: I assume that it has no positive effect on
> handling large files. (rather the query performance on large trees)
>
>
> thx and best regards
>
>
> Dietmar
>
>

Re: performance: handling large files

Posted by Marcel Reutegger <ma...@gmx.net>.
Stefan Guggisberg wrote:
> another thing worth considering: text extractors may eat up a lot of cpu power,
> especially when dealing with large files. if you don't really need
> fulltext search
> within your files, you should propably disable the relevant text extractors.

or, if you still want text extraction, but can live with asynchronous extraction 
configure the SearchIndex with:

<param name="extractorPoolSize" value="2"/>

this will use two background threads for text extraction that takes more than 
100 milliseconds.

regards
  marcel

Re: performance: handling large files

Posted by Stefan Guggisberg <st...@gmail.com>.
On Jan 10, 2008 2:23 PM, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
>
> On Jan 10, 2008 11:09 AM, Thomas Mueller <th...@gmail.com> wrote:
> > > - DataStore: may speed up the copy operation?
> >
> > If a BundledPersistenceManager is used, this should speeds up all
> > operations (read, write, copy). However I can't tell you how much
> > exactly, you need to test it yourself. And according to my latest
> > tests, you need to use a BundledPersistenceManager (I suggest to use a
> > database bundle PM).
>
> Regardless of the persistence manager, a copy operation with the
> DataStore should perform roughly the same as a move. In Dietmar's case
> that would be a nice drop from 40s to 0,05s. :-)

not quite correct, you forgot the create operation (create & copy take
40s...) ;)

but i agree, the data store should improve performance significantly.

another thing worth considering: text extractors may eat up a lot of cpu power,
especially when dealing with large files. if you don't really need
fulltext search
within your files, you should propably disable the relevant text extractors.

cheers
stefan

>
> BR,
>
> Jukka Zitting
>

Re: performance: handling large files

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Jan 10, 2008 11:09 AM, Thomas Mueller <th...@gmail.com> wrote:
> > - DataStore: may speed up the copy operation?
>
> If a BundledPersistenceManager is used, this should speeds up all
> operations (read, write, copy). However I can't tell you how much
> exactly, you need to test it yourself. And according to my latest
> tests, you need to use a BundledPersistenceManager (I suggest to use a
> database bundle PM).

Regardless of the persistence manager, a copy operation with the
DataStore should perform roughly the same as a move. In Dietmar's case
that would be a nice drop from 40s to 0,05s. :-)

BR,

Jukka Zitting

Re: performance: handling large files

Posted by Thomas Mueller <th...@gmail.com>.
Hi,

> - DataStore: may speed up the copy operation?

If a BundledPersistenceManager is used, this should speeds up all
operations (read, write, copy). However I can't tell you how much
exactly, you need to test it yourself. And according to my latest
tests, you need to use a BundledPersistenceManager (I suggest to use a
database bundle PM).

Regards,
Thomas