You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by François Cassistat <f...@maya-systems.com> on 2010/02/26 23:12:50 UTC

jcr2spi NodeIterator.getNode() performances

Hello list,

I am using jcr2spi on top spi2davex, with JackRabbit 2.0 on a remote server.

I've made a query to the server, it takes :
- 900ms to connect (this is acceptable)
- 500ms to do a XPath query on ~25K nodes with a "order by" close (this is good)
- 1000ms to get the node iterator for the result (this is ok)

And then, between 240ms and 310ms to get each node. Are this normal performances? At all, it took 16 seconds to load 56 node result. This is unacceptable in my project.

I assume that jcr2spi makes one HTTP query to every NodeIterator.getNode() call. Considered that I ping the server with ~70ms delay for large packets, are theses delays normal?

I don't know much about spi2davex details, but is there anyway to load all these nodes (and subnodes?) in one big HTTP request?


Thank you,


François


Re: jcr2spi NodeIterator.getNode() performances

Posted by Michael Dürig <mi...@day.com>.

On 3/4/10 4:55 PM, Paco Avila wrote:
> Thanks for the info :)
>
> PD: This info should be included in the Wiki.

Yes, I see what I can do.
Michael

>
>
> On Thu, Mar 4, 2010 at 2:30 PM, Michael Dürig<mi...@day.com>  wrote:
>>> I am interested on these parameters to improve jackrabbit performance. I
>>> have an installation with more than 2 million of documents and performance
>>> is actually poor :(
>>
>> On the current trunk there are 3 parameters which can be used to tweak
>> performance for jcr2spi/spi2davex. These are the size of the item info
>> cache, the size of the item cache and the depth of batch read operations.
>>
>>
>> Some Background:
>> The item cache contains JCR items (i.e. nodes and properties). The item info
>> cache contains item infos. An item info is an entity representing nodes or
>> properties on the SPI layer. The jcr2spi module receives item infos from an
>> SPI implementation (i.e. spi2davex) and uses them to build up a hierarchy of
>> JCR items.
>>
>> When an item is requested from the JCR API, jcr2spi first checks whether the
>> item is in the item cache. If so, that item is returned. If not, the request
>> is passed down to the SPI. But before actually calling the SPI the item info
>> cache is check first. If this cache contains the requested item info the
>> relevant part of the JCR hierarchy is build and the corresponding JCR item
>> is placed into the item cache. Only when the item info cache does not
>> contain the requested item info a call will be made to the SPI. Here the
>> batch read depth comes into play. Since calls to the SPI cause some latency
>> (i.e. network round trips), the SPI may - in addition to the actually
>> requested item info - return additional item infos. The batch read depth
>> parameter specifies the depth down to which item infos of the children of
>> the requested item info are returned.
>>
>> Overall the size of the item info cache and the batch read depth should be
>> used to optimize for the requirements of the back-end (i.e. network and
>> server). In general, the item info cache should be large enough to *easily*
>> hold all items from multiple batches. The batch read depth should be a trade
>> off between network latency and item info cache overhead. Finally the item
>> cache should be used to optimize for the requirements of the front-end (i.e.
>> the JCR API client). It should be able to hold the items in the current
>> working set of the API consumer.
>>
>> Some pointers:
>>
>> Batch reading: org.apache.jackrabbit.spi.RepositoryService#getItemInfos()
>> org.apache.jackrabbit.spi2davex.Spi2davexRepositoryServiceFactory#PARAM_BATCHREAD_CONFIG
>>
>> Item info cache size:
>> org.apache.jackrabbit.spi2davex.Spi2davexRepositoryServiceFactory#PARAM_ITEMINFO_CACHE_SIZE
>>
>> Item cache size:
>> org.apache.jackrabbit.jcr2spi.Jcr2spiRepositoryFactory#PARAM_ITEM_CACHE_SIZE
>>
>> Related JIRA issues:
>> JCR-2497: Improve jcr2spi read performance
>> JCR-2498: Implement caching mechanism for ItemInfo batches
>> JCR-2461: Item retrieval inefficient after refresh
>> JCR-2499: Add simple benchmarking tools for jcr2spi read perform
>>
>> Michael
>>
>> On 2/28/10 9:21 PM, Paco Avila wrote:
>>>
>>> El 28/02/2010 15:50, "Michael Dürig"<mi...@day.com>    escribió:
>>>
>>> François,
>>>
>>> I spent some time on improving performance lately. See
>>> https://issues.apache.org/jira/browse/JCR-2497 and related issues.
>>>
>>> I was able to improve performance for our use case with these fixes.
>>> Getting
>>> the parameters right (i.e. item cache size, item info cache size and batch
>>> read depth) is still quite tricky though and requires careful profiling.
>>>
>>> I can provide more specific information on these parameters if required.
>>>
>>> Michael
>>>
>>>
>>>
>>>
>>>
>>>
>>> François Cassistat wrote:
>>>>
>>>> Ok, I've studied a little what was going on with a packet analyze...
>>>
>>
>
>
>

Re: jcr2spi NodeIterator.getNode() performances

Posted by Paco Avila <mo...@gmail.com>.
Thanks for the info :)

PD: This info should be included in the Wiki.


On Thu, Mar 4, 2010 at 2:30 PM, Michael Dürig <mi...@day.com> wrote:
>> I am interested on these parameters to improve jackrabbit performance. I
>> have an installation with more than 2 million of documents and performance
>> is actually poor :(
>
> On the current trunk there are 3 parameters which can be used to tweak
> performance for jcr2spi/spi2davex. These are the size of the item info
> cache, the size of the item cache and the depth of batch read operations.
>
>
> Some Background:
> The item cache contains JCR items (i.e. nodes and properties). The item info
> cache contains item infos. An item info is an entity representing nodes or
> properties on the SPI layer. The jcr2spi module receives item infos from an
> SPI implementation (i.e. spi2davex) and uses them to build up a hierarchy of
> JCR items.
>
> When an item is requested from the JCR API, jcr2spi first checks whether the
> item is in the item cache. If so, that item is returned. If not, the request
> is passed down to the SPI. But before actually calling the SPI the item info
> cache is check first. If this cache contains the requested item info the
> relevant part of the JCR hierarchy is build and the corresponding JCR item
> is placed into the item cache. Only when the item info cache does not
> contain the requested item info a call will be made to the SPI. Here the
> batch read depth comes into play. Since calls to the SPI cause some latency
> (i.e. network round trips), the SPI may - in addition to the actually
> requested item info - return additional item infos. The batch read depth
> parameter specifies the depth down to which item infos of the children of
> the requested item info are returned.
>
> Overall the size of the item info cache and the batch read depth should be
> used to optimize for the requirements of the back-end (i.e. network and
> server). In general, the item info cache should be large enough to *easily*
> hold all items from multiple batches. The batch read depth should be a trade
> off between network latency and item info cache overhead. Finally the item
> cache should be used to optimize for the requirements of the front-end (i.e.
> the JCR API client). It should be able to hold the items in the current
> working set of the API consumer.
>
> Some pointers:
>
> Batch reading: org.apache.jackrabbit.spi.RepositoryService#getItemInfos()
> org.apache.jackrabbit.spi2davex.Spi2davexRepositoryServiceFactory#PARAM_BATCHREAD_CONFIG
>
> Item info cache size:
> org.apache.jackrabbit.spi2davex.Spi2davexRepositoryServiceFactory#PARAM_ITEMINFO_CACHE_SIZE
>
> Item cache size:
> org.apache.jackrabbit.jcr2spi.Jcr2spiRepositoryFactory#PARAM_ITEM_CACHE_SIZE
>
> Related JIRA issues:
> JCR-2497: Improve jcr2spi read performance
> JCR-2498: Implement caching mechanism for ItemInfo batches
> JCR-2461: Item retrieval inefficient after refresh
> JCR-2499: Add simple benchmarking tools for jcr2spi read perform
>
> Michael
>
> On 2/28/10 9:21 PM, Paco Avila wrote:
>>
>> El 28/02/2010 15:50, "Michael Dürig"<mi...@day.com>  escribió:
>>
>> François,
>>
>> I spent some time on improving performance lately. See
>> https://issues.apache.org/jira/browse/JCR-2497 and related issues.
>>
>> I was able to improve performance for our use case with these fixes.
>> Getting
>> the parameters right (i.e. item cache size, item info cache size and batch
>> read depth) is still quite tricky though and requires careful profiling.
>>
>> I can provide more specific information on these parameters if required.
>>
>> Michael
>>
>>
>>
>>
>>
>>
>> François Cassistat wrote:
>>>
>>> Ok, I've studied a little what was going on with a packet analyze...
>>
>



-- 
OpenKM
http://www.openkm.com
http://www.guia-ubuntu.org

Re: jcr2spi NodeIterator.getNode() performances

Posted by Michael Dürig <mi...@day.com>.
> I am interested on these parameters to improve jackrabbit performance. I
> have an installation with more than 2 million of documents and performance
> is actually poor :(

On the current trunk there are 3 parameters which can be used to tweak 
performance for jcr2spi/spi2davex. These are the size of the item info 
cache, the size of the item cache and the depth of batch read operations.


Some Background:
The item cache contains JCR items (i.e. nodes and properties). The item 
info cache contains item infos. An item info is an entity representing 
nodes or properties on the SPI layer. The jcr2spi module receives item 
infos from an SPI implementation (i.e. spi2davex) and uses them to build 
up a hierarchy of JCR items.

When an item is requested from the JCR API, jcr2spi first checks whether 
the item is in the item cache. If so, that item is returned. If not, the 
request is passed down to the SPI. But before actually calling the SPI 
the item info cache is check first. If this cache contains the requested 
item info the relevant part of the JCR hierarchy is build and the 
corresponding JCR item is placed into the item cache. Only when the item 
info cache does not contain the requested item info a call will be made 
to the SPI. Here the batch read depth comes into play. Since calls to 
the SPI cause some latency (i.e. network round trips), the SPI may - in 
addition to the actually requested item info - return additional item 
infos. The batch read depth parameter specifies the depth down to which 
item infos of the children of the requested item info are returned.

Overall the size of the item info cache and the batch read depth should 
be used to optimize for the requirements of the back-end (i.e. network 
and server). In general, the item info cache should be large enough to 
*easily* hold all items from multiple batches. The batch read depth 
should be a trade off between network latency and item info cache 
overhead. Finally the item cache should be used to optimize for the 
requirements of the front-end (i.e. the JCR API client). It should be 
able to hold the items in the current working set of the API consumer.

Some pointers:

Batch reading: org.apache.jackrabbit.spi.RepositoryService#getItemInfos()
org.apache.jackrabbit.spi2davex.Spi2davexRepositoryServiceFactory#PARAM_BATCHREAD_CONFIG

Item info cache size:
org.apache.jackrabbit.spi2davex.Spi2davexRepositoryServiceFactory#PARAM_ITEMINFO_CACHE_SIZE

Item cache size:
org.apache.jackrabbit.jcr2spi.Jcr2spiRepositoryFactory#PARAM_ITEM_CACHE_SIZE 


Related JIRA issues:
JCR-2497: Improve jcr2spi read performance
JCR-2498: Implement caching mechanism for ItemInfo batches
JCR-2461: Item retrieval inefficient after refresh
JCR-2499: Add simple benchmarking tools for jcr2spi read perform

Michael

On 2/28/10 9:21 PM, Paco Avila wrote:
>
> El 28/02/2010 15:50, "Michael Dürig"<mi...@day.com>  escribió:
>
> François,
>
> I spent some time on improving performance lately. See
> https://issues.apache.org/jira/browse/JCR-2497 and related issues.
>
> I was able to improve performance for our use case with these fixes. Getting
> the parameters right (i.e. item cache size, item info cache size and batch
> read depth) is still quite tricky though and requires careful profiling.
>
> I can provide more specific information on these parameters if required.
>
> Michael
>
>
>
>
>
>
> François Cassistat wrote:
>>
>> Ok, I've studied a little what was going on with a packet analyze...
>

Re: jcr2spi NodeIterator.getNode() performances

Posted by Paco Avila <mo...@gmail.com>.
I am interested on these parameters to improve jackrabbit performance. I
have an installation with more than 2 million of documents and performance
is actually poor :(

El 28/02/2010 15:50, "Michael Dürig" <mi...@day.com> escribió:

François,

I spent some time on improving performance lately. See
https://issues.apache.org/jira/browse/JCR-2497 and related issues.

I was able to improve performance for our use case with these fixes. Getting
the parameters right (i.e. item cache size, item info cache size and batch
read depth) is still quite tricky though and requires careful profiling.

I can provide more specific information on these parameters if required.

Michael






François Cassistat wrote:
>
> Ok, I've studied a little what was going on with a packet analyze...

Re: jcr2spi NodeIterator.getNode() performances

Posted by Michael Dürig <mi...@day.com>.
François,

I spent some time on improving performance lately. See 
https://issues.apache.org/jira/browse/JCR-2497 and related issues.

I was able to improve performance for our use case with these fixes. 
Getting the parameters right (i.e. item cache size, item info cache size 
and batch read depth) is still quite tricky though and requires careful 
profiling.

I can provide more specific information on these parameters if required.

Michael




François Cassistat wrote:
> Ok, I've studied a little what was going on with a packet analyzer.
> 
> As I expected, jcr2spi does request for each node, but I did not expect it to do 3 requests for each node of my iterator (one request to get the node, and two for asking some UUID properties (for files ?)).
> 
> We need to pass in each node and consult three or four properties to render in a table. In my product, we can't afford waiting 17 seconds fetching information about 56 nodes (or 5 minutes for 1500 nodes). Is there any way to configure spi2davex to get all the query result in one big HTTP request?
> 
> 
> Thanks,
> 
> 
> François
> 
> 
> 
> Le 2010-02-26 à 5:12 PM, François Cassistat a écrit :
> 
>> Hello list,
>>
>> I am using jcr2spi on top spi2davex, with JackRabbit 2.0 on a remote server.
>>
>> I've made a query to the server, it takes :
>> - 900ms to connect (this is acceptable)
>> - 500ms to do a XPath query on ~25K nodes with a "order by" close (this is good)
>> - 1000ms to get the node iterator for the result (this is ok)
>>
>> And then, between 240ms and 310ms to get each node. Are this normal performances? At all, it took 16 seconds to load 56 node result. This is unacceptable in my project.
>>
>> I assume that jcr2spi makes one HTTP query to every NodeIterator.getNode() call. Considered that I ping the server with ~70ms delay for large packets, are theses delays normal?
>>
>> I don't know much about spi2davex details, but is there anyway to load all these nodes (and subnodes?) in one big HTTP request?
>>
>>
>> Thank you,
>>
>>
>> François
>>
> 



Re: jcr2spi NodeIterator.getNode() performances

Posted by François Cassistat <f...@maya-systems.com>.
Ok, I've studied a little what was going on with a packet analyzer.

As I expected, jcr2spi does request for each node, but I did not expect it to do 3 requests for each node of my iterator (one request to get the node, and two for asking some UUID properties (for files ?)).

We need to pass in each node and consult three or four properties to render in a table. In my product, we can't afford waiting 17 seconds fetching information about 56 nodes (or 5 minutes for 1500 nodes). Is there any way to configure spi2davex to get all the query result in one big HTTP request?


Thanks,


François



Le 2010-02-26 à 5:12 PM, François Cassistat a écrit :

> Hello list,
> 
> I am using jcr2spi on top spi2davex, with JackRabbit 2.0 on a remote server.
> 
> I've made a query to the server, it takes :
> - 900ms to connect (this is acceptable)
> - 500ms to do a XPath query on ~25K nodes with a "order by" close (this is good)
> - 1000ms to get the node iterator for the result (this is ok)
> 
> And then, between 240ms and 310ms to get each node. Are this normal performances? At all, it took 16 seconds to load 56 node result. This is unacceptable in my project.
> 
> I assume that jcr2spi makes one HTTP query to every NodeIterator.getNode() call. Considered that I ping the server with ~70ms delay for large packets, are theses delays normal?
> 
> I don't know much about spi2davex details, but is there anyway to load all these nodes (and subnodes?) in one big HTTP request?
> 
> 
> Thank you,
> 
> 
> François
>