You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Savas Triantafillou <sa...@gmail.com> on 2007/02/10 00:48:30 UTC

Jackrabbit query performance issues

Hi all,

I would like to point out two issues concerning query function and
performance

The repository against which the following queries run, contained only 500
nodes, all versionable, 340 of which were of node type required by the
queries

The purpose is to find the most suitable form for a query to be efficient
and fast

1.  As you may see in the following queries, I would like to load all nodes
of a certain type  using several forms

     The first one provides no information about the root path of the nodes,
nor any information about their name

             DEBUG - QueryImpl.execute(149) | executed in 0,26 s.
(//element(*, my:object))


      The second one provides information about the node's name and is
already slower than the first one, considering that it executed immediately
after the first query
      (i.e. cache seemed not to be working) and that it is slightly more
specific than the first one

               DEBUG - QueryImpl.execute(149) | executed in 0,36 s.
(//element(objectName, my:object))


      The third query is similar to first one except the presence of the
ordering. Is the difference in time justified only by the presence of the
ordering ?

                DEBUG - QueryImpl.execute(149) | executed in 1,03 s.
(//element(*, my:object) order by @modified descending)

       The fourth query  is similar to the second one with the addition of
the orerding. Taking into account query execution times so far
       this time seems the most rational

                DEBUG - QueryImpl.execute(149) | executed in 0,58 s.
(//element(objectName, my:object) order by @modified descending)

        The fifth query is more specific concerning the path of the nodes.
It seems that cache seems to be working now

                DEBUG - QueryImpl.execute(149) | executed in 0,12 s.
(/jcr:root/my:system/my:objectRoot//element(*, my:object))

         The sixth query is even more specific, yet it is slower than the
above one!!!!

                 DEBUG - QueryImpl.execute(149) | executed in 0,25 s.
(/jcr:root/my:system/my:objectRoot//element(objectName, my:object))

         The last two queries differ in the presence of the ordering

                   DEBUG - QueryImpl.execute(149) | executed in 0,62 s.
(/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(*,
my:object) order by @modified descending)
                  DEBUG - QueryImpl.execute(149) | executed in 0,14 s.
(/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(objectName,
my:object) order by @modified descending)


Now, in order to have a more complete view, I have changed the order of the
queries in that more specific queries are executed first. Here are the
results

DEBUG - QueryImpl.execute(149) | executed in 0,55 s.
(/jcr:root/my:system/my:objectRoot//element(*, my:object))
DEBUG - QueryImpl.execute(149) | executed in 0,44 s.
(/jcr:root/my:system/my:objectRoot//element(objectName, my:object))
DEBUG - QueryImpl.execute(149) | executed in 1,36 s.
(/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(*,
my:object) order by @modified descending)
DEBUG - QueryImpl.execute(149) | executed in 0,16 s.
(/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(objectName,
my:object) order by @modified descending)
DEBUG - QueryImpl.execute(149) | executed in 0,03 s. (//element(*,
my:object))
DEBUG - QueryImpl.execute(149) | executed in 0,30 s. (//element(objectName,
my:object))
DEBUG - QueryImpl.execute(149) | executed in 0,28 s. (//element(*,
my:object) order by @modified descending)
DEBUG - QueryImpl.execute(149) | executed in 0,11 s. (//element(objectName,
my:object) order by @modified descending)


My belief is that there is no specific rule for creating a query that will
guarantee a satisfactory time, not even the most obvious one, i.e. the more
specific the query is,
the faster it becomes.


2.  For each one of the 340 nodes I have created 40 versions and then rerun
the above queries. All times tripled which makes me think that a query of
type

     //element(*, my:nodeType)  will make Jackrabbit search through its
version nodes as well. If this is the case, why this is happening?


I would really appreciate your thoughts as we are using Jackrabbit as a
backend to a portal and migration from 1.1.1 to 1.2.1 changed portal
performance dramatically.
Therefore understanding of how Jackrabbit functions in respect to queries is
crucial


Thank you for your time,

Savvas

Fwd: Jackrabbit query performance issues

Posted by Savas Triantafillou <sa...@gmail.com>.
Try to resend email.

I would appreciate if someone could answer the questions posed in this email

Thank you

Savvas

---------- Forwarded message ----------
From: Savas Triantafillou <sa...@gmail.com>
Date: Feb 10, 2007 1:48 AM
Subject: Jackrabbit query performance issues
To: dev@jackrabbit.apache.org, users@jackrabbit.apache.org

Hi all,

I would like to point out two issues concerning query function and
performance

The repository against which the following queries run, contained only 500
nodes, all versionable, 340 of which were of node type required by the
queries

The purpose is to find the most suitable form for a query to be efficient
and fast

1.  As you may see in the following queries, I would like to load all nodes
of a certain type  using several forms

     The first one provides no information about the root path of the nodes,
nor any information about their name

             DEBUG - QueryImpl.execute(149) | executed in 0,26 s.
(//element(*, my:object))


      The second one provides information about the node's name and is
already slower than the first one, considering that it executed immediately
after the first query
      (i.e. cache seemed not to be working) and that it is slightly more
specific than the first one

               DEBUG - QueryImpl.execute(149) | executed in 0,36 s.
(//element(objectName, my:object))


      The third query is similar to first one except the presence of the
ordering. Is the difference in time justified only by the presence of the
ordering ?

                DEBUG - QueryImpl.execute(149) | executed in 1,03 s.
(//element(*, my:object) order by @modified descending)

       The fourth query  is similar to the second one with the addition of
the orerding. Taking into account query execution times so far
       this time seems the most rational

                DEBUG - QueryImpl.execute(149) | executed in 0,58 s.
(//element(objectName, my:object) order by @modified descending)

        The fifth query is more specific concerning the path of the nodes.
It seems that cache seems to be working now

                DEBUG - QueryImpl.execute(149) | executed in 0,12 s.
(/jcr:root/my:system/my:objectRoot//element(*, my:object))

         The sixth query is even more specific, yet it is slower than the
above one!!!!

                 DEBUG - QueryImpl.execute(149) | executed in 0,25 s.
(/jcr:root/my:system/my:objectRoot//element(objectName, my:object))

         The last two queries differ in the presence of the ordering

                   DEBUG - QueryImpl.execute(149) | executed in 0,62 s.
(/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(*,
my:object) order by @modified descending)
                  DEBUG - QueryImpl.execute(149) | executed in 0,14 s.
(/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(objectName,
my:object) order by @modified descending)


Now, in order to have a more complete view, I have changed the order of the
queries in that more specific queries are executed first. Here are the
results

DEBUG - QueryImpl.execute(149) | executed in 0,55 s.
(/jcr:root/my:system/my:objectRoot//element(*, my:object))
DEBUG - QueryImpl.execute(149) | executed in 0,44 s.
(/jcr:root/my:system/my:objectRoot//element(objectName, my:object))
DEBUG - QueryImpl.execute(149) | executed in 1,36 s.
(/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(*,
my:object) order by @modified descending)
DEBUG - QueryImpl.execute(149) | executed in 0,16 s.
(/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(objectName,
my:object) order by @modified descending)
DEBUG - QueryImpl.execute(149) | executed in 0,03 s. (//element(*,
my:object))
DEBUG - QueryImpl.execute(149) | executed in 0,30 s. (//element(objectName,
my:object))
DEBUG - QueryImpl.execute(149) | executed in 0,28 s. (//element(*,
my:object) order by @modified descending)
DEBUG - QueryImpl.execute(149) | executed in 0,11 s. (//element(objectName,
my:object) order by @modified descending)


My belief is that there is no specific rule for creating a query that will
guarantee a satisfactory time, not even the most obvious one, i.e. the more
specific the query is,
the faster it becomes.


2.  For each one of the 340 nodes I have created 40 versions and then rerun
the above queries. All times tripled which makes me think that a query of
type

     //element(*, my:nodeType)  will make Jackrabbit search through its
version nodes as well. If this is the case, why this is happening?


I would really appreciate your thoughts as we are using Jackrabbit as a
backend to a portal and migration from 1.1.1 to 1.2.1 changed portal
performance dramatically.
Therefore understanding of how Jackrabbit functions in respect to queries is
crucial


Thank you for your time,

Savvas

Re: Jackrabbit query performance issues

Posted by Marcel Reutegger <ma...@gmx.net>.
Hi Savas,

Savas Triantafillou wrote:
> 1.  As you may see in the following queries, I would like to load all nodes
> of a certain type  using several forms
> 
>     The first one provides no information about the root path of the nodes,
> nor any information about their name
> 
>             DEBUG - QueryImpl.execute(149) | executed in 0,26 s.
> (//element(*, my:object))
> 
> 
>      The second one provides information about the node's name and is
> already slower than the first one, considering that it executed immediately
> after the first query
>      (i.e. cache seemed not to be working) and that it is slightly more
> specific than the first one
> 
>               DEBUG - QueryImpl.execute(149) | executed in 0,36 s.
> (//element(objectName, my:object))

This runs much faster on my jackrabbit instance.

I'm using 2000 test nodes of type nt:unstructured, each returning 21 nodes.

QueryImpl: executed in 0.14 s. (//element(node0, nt:unstructured))
QueryImpl: executed in 0.00 s. (//element(node1, nt:unstructured))
QueryImpl: executed in 0.02 s. (//element(node2, nt:unstructured))
QueryImpl: executed in 0.00 s. (//element(node3, nt:unstructured))
QueryImpl: executed in 0.02 s. (//element(node4, nt:unstructured))
QueryImpl: executed in 0.00 s. (//element(node5, nt:unstructured))
QueryImpl: executed in 0.02 s. (//element(node6, nt:unstructured))
QueryImpl: executed in 0.00 s. (//element(node7, nt:unstructured))
QueryImpl: executed in 0.02 s. (//element(node8, nt:unstructured))
QueryImpl: executed in 0.00 s. (//element(node9, nt:unstructured))

The first query is considerably slower because the path cache in the query 
handler needs to be filled.

>      The third query is similar to first one except the presence of the
> ordering. Is the difference in time justified only by the presence of the
> ordering ?
> 
>                DEBUG - QueryImpl.execute(149) | executed in 1,03 s.
> (//element(*, my:object) order by @modified descending)
> 
>       The fourth query  is similar to the second one with the addition of
> the orerding. Taking into account query execution times so far
>       this time seems the most rational
> 
>                DEBUG - QueryImpl.execute(149) | executed in 0,58 s.
> (//element(objectName, my:object) order by @modified descending)
> 
>        The fifth query is more specific concerning the path of the nodes.
> It seems that cache seems to be working now
> 
>                DEBUG - QueryImpl.execute(149) | executed in 0,12 s.
> (/jcr:root/my:system/my:objectRoot//element(*, my:object))
> 
>         The sixth query is even more specific, yet it is slower than the
> above one!!!!

that's probably because it involves an additional AND operation. nodes with a 
certain name intersected with nodes of a certain type. whereas the latter only 
searches for nodes with a certain type.

>                 DEBUG - QueryImpl.execute(149) | executed in 0,25 s.
> (/jcr:root/my:system/my:objectRoot//element(objectName, my:object))
> 
>         The last two queries differ in the presence of the ordering
> 
>                   DEBUG - QueryImpl.execute(149) | executed in 0,62 s.
> (/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(*,
> my:object) order by @modified descending)
>                  DEBUG - QueryImpl.execute(149) | executed in 0,14 s.
> (/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(objectName, 
> 
> my:object) order by @modified descending)
> 
> 
> Now, in order to have a more complete view, I have changed the order of the
> queries in that more specific queries are executed first. Here are the
> results
> 
> DEBUG - QueryImpl.execute(149) | executed in 0,55 s.
> (/jcr:root/my:system/my:objectRoot//element(*, my:object))
> DEBUG - QueryImpl.execute(149) | executed in 0,44 s.
> (/jcr:root/my:system/my:objectRoot//element(objectName, my:object))
> DEBUG - QueryImpl.execute(149) | executed in 1,36 s.
> (/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(*,
> my:object) order by @modified descending)
> DEBUG - QueryImpl.execute(149) | executed in 0,16 s.
> (/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(objectName, 
> 
> my:object) order by @modified descending)
> DEBUG - QueryImpl.execute(149) | executed in 0,03 s. (//element(*,
> my:object))
> DEBUG - QueryImpl.execute(149) | executed in 0,30 s. (//element(objectName,
> my:object))
> DEBUG - QueryImpl.execute(149) | executed in 0,28 s. (//element(*,
> my:object) order by @modified descending)
> DEBUG - QueryImpl.execute(149) | executed in 0,11 s. (//element(objectName,
> my:object) order by @modified descending)
> 
> 
> My belief is that there is no specific rule for creating a query that will
> guarantee a satisfactory time, not even the most obvious one, i.e. the more
> specific the query is,
> the faster it becomes.

This is not always the case. e.g. more specific may also mean in some cases more 
complex to execute.

> 2.  For each one of the 340 nodes I have created 40 versions and then rerun
> the above queries. All times tripled which makes me think that a query of
> type
> 
>     //element(*, my:nodeType)  will make Jackrabbit search through its
> version nodes as well. If this is the case, why this is happening?

because the query also includes the jcr:system subtree. If you not interested in 
nodes from the version store you need to exclude jcr:system subtree. E.g. have 
your content under a designated node instead of directly under the root node. 
Then you can search just in your content:
/jcr:root/my:content//element(*, my:type)

OR

if you don't want versions in your query results at all you can also disable 
indexing of versions:

- Remove or comment the tag /Repository/SearchIndex in your repository.xml

This change requires that you re-index all workspaces.

> I would really appreciate your thoughts as we are using Jackrabbit as a
> backend to a portal and migration from 1.1.1 to 1.2.1 changed portal
> performance dramatically.

Can you please provide examples of queries that changed in performance between 
the two versions?

regards
  marcel