You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by Luiz Fernando Teston <fe...@caravelatech.com> on 2010/02/10 15:10:38 UTC

performance

Hi,


I use jackrabbit 1.6.0 on mac os X on my application. I'm running a few
stress tests on my application right now. I just got a few performance
problems and I'd like to change my configuration to get optimal performance.
So, here I got few questions:

* To achieve optimal performance may I use some DatabasePersitenceManager,
FileSystemPersistenceManager or there's another one wich is good for optimal
performance? Of course I can't use in memory in this case.
* I didn't find any example of repository.xml configured to get optimal
performance. Do you guys can send me any link or example file?
* I didn't find either some page helping to configure lucene indexes to make
the search faster. Can you send me any link for this also?

I'm using Jackrabbit on the same VM as the client code, and also, I'm not
using RMI or something like this. It spends much time saving, and less time
performing queries to see if a node exists or not before saving. Most of the
code is using the Davids model. Most of the data are hierarchical. I think
most of the XPaths rely on this (I just read it is bad).

Of course, I appreciate if you have any other advice to me to achieve a
faster performance!

Regards,



Teston

Re: performance

Posted by Alexander Klimetschek <ak...@day.com>.

On Wed, Feb 10, 2010 at 15:45, Guo Du <mr...@gmail.com> wrote:
> Just curious that is there any performance difference between query
> languages: xpath, sql, JCR-SQL2, JCR-JQOM?

There shouldn't be any notable difference, as they are all transformed
into a lucene query that runs on the lucene index. Small differences
could be in parsing times, but I doubt they have an impact.

Also, some query languages offer features that others don't have. Most
notably, the new JCR-SQL2 / JQOM allow for joins, which could be a bit
slower than not using a join (but I have no experience with that yet
to judge it). But that's a bit like comparing apples with oranges ;-)

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: performance

Posted by Rakesh Vidyadharan <ra...@sptci.com>.

Yes, bundle Db PM with embedded database was the best for me.  I ended up using H2, although the default repository configuration (Derby) is probably good enough.

Rakesh

On 10 Feb 2010, at 10:22, Luiz Fernando Teston wrote:

> Rajesh,
> 
> So, it appears that using database bundle is the best option for my current
> scenario, right?
> 
> Thanks for the feedback!
> 
> 
> Teston
> 
> On Wed, Feb 10, 2010 at 2:07 PM, Rakesh Vidyadharan <ra...@sptci.com>wrote:
> 
>> If you are going to do a lot of writes, the filesystem option may turn out
>> to be terribly slow.  That was my experience on OS X importing around 40000
>> nodes.
>> 
>> Rajesh
>> 
>> 
>> On Feb 10, 2010, at 9:53 AM, Luiz Fernando Teston <
>> feu.teston@caravelatech.com> wrote:
>> 
>> Guo,
>>> 
>>> Just curious that is there any performance difference between query
>>> 
>>>> languages: xpath, sql, JCR-SQL2, JCR-JQOM?
>>>> 
>>>> I don't know. I'm using only xpath
>>>> 
>>> 
>>> 
>>> The default repository configuration is already very optimized: embed
>>>> derby + file store. You cannot get better performance by using remote
>>>> db such as mysql.
>>>> 
>>>> I'm trying it with postgresql, but gonna test also with file system to
>>> see
>>> if it improves the performance or not.
>>> 
>>> Write to disk is much expensive than read in general, so it's
>>> 
>>>> expected. Repository is best for content system, assume most action
>>>> are read operation, hope your stress test reflect the read/write
>>>> radio. You could aggressively improve read performance by add a cache
>>>> layer in your application if applicable.
>>>> 
>>>> 
>>> We cache information a lot when possible. But this stress test I'm running
>>> will be a massive import. So it has lots of writes and also some queries
>>> to
>>> avoid duplicate data. It's a very uncommon scenario.
>>> 
>>> Thanks for the feedback!
>>> 
>>> 
>>> Teston
>>> 
>>> 
>>> 
>>>> -Guo
>>>> 
>>>> 

Rakesh Vidyadharan
President & CEO
Sans Pareil Technologies, Inc.
http://sptci.com/


| 100 W. Chestnut, Suite 1305 | Chicago, IL 60610-3296 USA |
| Ph: +1 (312) 212 3933 | Mobile: +1 (312) 315-1596 (US), +91  949 611 0873 (IN) | Fax: +1 (312) 276-4410 | E-mail: rakesh@sptci.com

Re: performance

Posted by Luiz Fernando Teston <fe...@caravelatech.com>.

Rajesh,

So, it appears that using database bundle is the best option for my current
scenario, right?

Thanks for the feedback!


Teston

On Wed, Feb 10, 2010 at 2:07 PM, Rakesh Vidyadharan <ra...@sptci.com>wrote:

> If you are going to do a lot of writes, the filesystem option may turn out
> to be terribly slow.  That was my experience on OS X importing around 40000
> nodes.
>
> Rajesh
>
>
> On Feb 10, 2010, at 9:53 AM, Luiz Fernando Teston <
> feu.teston@caravelatech.com> wrote:
>
>  Guo,
>>
>> Just curious that is there any performance difference between query
>>
>>> languages: xpath, sql, JCR-SQL2, JCR-JQOM?
>>>
>>> I don't know. I'm using only xpath
>>>
>>
>>
>>  The default repository configuration is already very optimized: embed
>>> derby + file store. You cannot get better performance by using remote
>>> db such as mysql.
>>>
>>>  I'm trying it with postgresql, but gonna test also with file system to
>> see
>> if it improves the performance or not.
>>
>> Write to disk is much expensive than read in general, so it's
>>
>>> expected. Repository is best for content system, assume most action
>>> are read operation, hope your stress test reflect the read/write
>>> radio. You could aggressively improve read performance by add a cache
>>> layer in your application if applicable.
>>>
>>>
>> We cache information a lot when possible. But this stress test I'm running
>> will be a massive import. So it has lots of writes and also some queries
>> to
>> avoid duplicate data. It's a very uncommon scenario.
>>
>> Thanks for the feedback!
>>
>>
>> Teston
>>
>>
>>
>>> -Guo
>>>
>>>

Re: performance

Posted by Rakesh Vidyadharan <ra...@sptci.com>.

If you are going to do a lot of writes, the filesystem option may turn  
out to be terribly slow.  That was my experience on OS X importing  
around 40000 nodes.

Rajesh

On Feb 10, 2010, at 9:53 AM, Luiz Fernando Teston <feu.teston@caravelatech.com 
 > wrote:

> Guo,
>
> Just curious that is there any performance difference between query
>> languages: xpath, sql, JCR-SQL2, JCR-JQOM?
>>
>> I don't know. I'm using only xpath
>
>
>> The default repository configuration is already very optimized: embed
>> derby + file store. You cannot get better performance by using remote
>> db such as mysql.
>>
> I'm trying it with postgresql, but gonna test also with file system  
> to see
> if it improves the performance or not.
>
> Write to disk is much expensive than read in general, so it's
>> expected. Repository is best for content system, assume most action
>> are read operation, hope your stress test reflect the read/write
>> radio. You could aggressively improve read performance by add a cache
>> layer in your application if applicable.
>>
>
> We cache information a lot when possible. But this stress test I'm  
> running
> will be a massive import. So it has lots of writes and also some  
> queries to
> avoid duplicate data. It's a very uncommon scenario.
>
> Thanks for the feedback!
>
>
> Teston
>
>
>>
>> -Guo
>>

Re: performance

Posted by Luiz Fernando Teston <fe...@caravelatech.com>.

Guo,

Just curious that is there any performance difference between query
> languages: xpath, sql, JCR-SQL2, JCR-JQOM?
>
> I don't know. I'm using only xpath


> The default repository configuration is already very optimized: embed
> derby + file store. You cannot get better performance by using remote
> db such as mysql.
>
I'm trying it with postgresql, but gonna test also with file system to see
if it improves the performance or not.

Write to disk is much expensive than read in general, so it's
> expected. Repository is best for content system, assume most action
> are read operation, hope your stress test reflect the read/write
> radio. You could aggressively improve read performance by add a cache
> layer in your application if applicable.
>

We cache information a lot when possible. But this stress test I'm running
will be a massive import. So it has lots of writes and also some queries to
avoid duplicate data. It's a very uncommon scenario.

Thanks for the feedback!


Teston


>
> -Guo
>

Re: performance

Posted by Guo Du <mr...@gmail.com>.

On Wed, Feb 10, 2010 at 2:24 PM, Alexander Klimetschek <ak...@day.com> wrote:
> You can't improve that much by configuration. It depends on your
> content model and queries. The fastest query type is when looking for
> specific property values (eg. @property='value') or node types via
> element(*, my:nodetype) (which resolves to the same property lookup in
> Lucene internally), or using fulltext lookups (jcr:contains). Path
> steps are not optimized.
>

Just curious that is there any performance difference between query
languages: xpath, sql, JCR-SQL2, JCR-JQOM?


> I use jackrabbit 1.6.0 on mac os X on my application. I'm running a few
> stress tests on my application right now. I just got a few performance
> problems and I'd like to change my configuration to get optimal performance.
> So, here I got few questions:
>
> * To achieve optimal performance may I use some DatabasePersitenceManager,
> FileSystemPersistenceManager or there's another one wich is good for optimal
> performance? Of course I can't use in memory in this case.
> * I didn't find any example of repository.xml configured to get optimal
> performance. Do you guys can send me any link or example file?
The default repository configuration is already very optimized: embed
derby + file store. You cannot get better performance by using remote
db such as mysql.


> using RMI or something like this. It spends much time saving, and less time
> performing queries to see if a node exists or not before saving. Most of the
Write to disk is much expensive than read in general, so it's
expected. Repository is best for content system, assume most action
are read operation, hope your stress test reflect the read/write
radio. You could aggressively improve read performance by add a cache
layer in your application if applicable.

-Guo

Re: performance

Posted by Luiz Fernando Teston <fe...@caravelatech.com>.

Alex,


I gonna take a look on these links. Thanks for the fast feedback!


Teston

On Wed, Feb 10, 2010 at 12:24 PM, Alexander Klimetschek <ak...@day.com>wrote:

> On Wed, Feb 10, 2010 at 15:10, Luiz Fernando Teston
> <fe...@caravelatech.com> wrote:
> > * To achieve optimal performance may I use some
> DatabasePersitenceManager,
> > FileSystemPersistenceManager or there's another one wich is good for
> optimal
> > performance? Of course I can't use in memory in this case.
> > * I didn't find any example of repository.xml configured to get optimal
> > performance. Do you guys can send me any link or example file?
>
> Did you look at http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ?
>
> > * I didn't find either some page helping to configure lucene indexes to
> make
> > the search faster. Can you send me any link for this also?
>
> You can't improve that much by configuration. It depends on your
> content model and queries. The fastest query type is when looking for
> specific property values (eg. @property='value') or node types via
> element(*, my:nodetype) (which resolves to the same property lookup in
> Lucene internally), or using fulltext lookups (jcr:contains). Path
> steps are not optimized.
>
> See also
> http://n4.nabble.com/Explanation-and-solutions-of-some-Jackrabbit-queries-regarding-performance-td516614.html#a516614
>
> Regards,
> Alex
>
> --
> Alexander Klimetschek
> alexander.klimetschek@day.com
>

Re: performance

Posted by Alexander Klimetschek <ak...@day.com>.

On Wed, Feb 10, 2010 at 15:10, Luiz Fernando Teston
<fe...@caravelatech.com> wrote:
> * To achieve optimal performance may I use some DatabasePersitenceManager,
> FileSystemPersistenceManager or there's another one wich is good for optimal
> performance? Of course I can't use in memory in this case.
> * I didn't find any example of repository.xml configured to get optimal
> performance. Do you guys can send me any link or example file?

Did you look at http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ?

> * I didn't find either some page helping to configure lucene indexes to make
> the search faster. Can you send me any link for this also?

You can't improve that much by configuration. It depends on your
content model and queries. The fastest query type is when looking for
specific property values (eg. @property='value') or node types via
element(*, my:nodetype) (which resolves to the same property lookup in
Lucene internally), or using fulltext lookups (jcr:contains). Path
steps are not optimized.

See also http://n4.nabble.com/Explanation-and-solutions-of-some-Jackrabbit-queries-regarding-performance-td516614.html#a516614

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com