You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Cédric Damioli <ce...@anyware-tech.com> on 2006/03/30 23:17:15 UTC

Search performance issue

Hi all,

In my repository, I have a Node named 'content' under which I have an 
arbitrary number of Node. Under each of these Nodes, I have one Node 
named 'fr'.

My exemple query is simple: I want to get all "fr" Nodes.

1) I executed the following query : "//content/*/fr". The result is ok 
but the execution took more than 80s (the whole repository has more than 
100 000 Nodes and more than 1 000 000 properties)
2) I executed the query "//content/*" followed by a small Java loop for 
getting the "fr" subNode of each result. The whole thing took only a 
couple of seconds.

Is it the normal behaviour ? Does the query have to end with "/*" to be 
correctly handled by Jackrabbit ?

Regards,

-- 
Cédric Damioli
ANYWARE TECHNOLOGIES
Tel : +33 (0)5 61 00 52 90
Fax : +33 (0)5 61 00 51 46
http://www.anyware-tech.com


Re: Search performance issue

Posted by Marcel Reutegger <ma...@gmx.net>.
Cédric Damioli wrote:
> IIUC, the document order flag does not affect the query execution time 
> (ie Query.execute()), but only the first NodeIterator.nextNode() call. 
> Or am I wrong on this ?

no, this is correct.
execution of the query is the same no matter if document order is 
enabled or not. document order is established (if needed) on the result 
iterator when it is first accessed. exactly as you described.

> In my case, I only consider the execution time, so the 
> respectDocumentOrder has no incidence (I have tested with or without it 
> and results are the same).
> So in this case, 80 seconds is indeed not acceptable.
> 
> I'll file a new new issus as soon as I have finished my benchmarks, to 
> be able to give some real statistics.

ok, thanks. can you then please provide information on the structure of 
your content? I don't need the exact data, just the rough structure and 
amount of content.

regards
  marcel

Re: Search performance issue

Posted by Cédric Damioli <ce...@anyware-tech.com>.
Marcel Reutegger a écrit :
> Cédric Damioli wrote:
>> In my repository, I have a Node named 'content' under which I have an 
>> arbitrary number of Node. Under each of these Nodes, I have one Node 
>> named 'fr'.
>>
>> My exemple query is simple: I want to get all "fr" Nodes.
>>
>> 1) I executed the following query : "//content/*/fr". The result is 
>> ok but the execution took more than 80s (the whole repository has 
>> more than 100 000 Nodes and more than 1 000 000 properties)
> >
>> 2) I executed the query "//content/*" followed by a small Java loop 
>> for getting the "fr" subNode of each result. The whole thing took 
>> only a couple of seconds.
>
> I assume the result set is quite large, therefore you should disable 
> document ordering on result nodes in the search configuration. per 
> default result nodes are ordered in document order, which is an 
> operation that is performed without information from the search index. 
> That is, all information must be loaded through the persistence 
> manager to arrange the result nodes in document order.
>
> adding the following parameter in SearchIndex tag in workspace.xml 
> will do the trick:
>   <param name="respectDocumentOrder" value="false"/>
>
> for more details on index configuration see also:
> http://svn.apache.org/viewcvs.cgi/jackrabbit/trunk/jackrabbit/src/main/config/repository.xml?view=markup 
>
>
> as a quick workaround you can also append an order by clause to the 
> query, this will also avoid document order on the result nodes:
> //content/*/fr order by jcr:score
>
> If you already disabled document order then 80 seconds is IMO not 
> acceptable. In that case could you please file a jira issue.
>
IIUC, the document order flag does not affect the query execution time 
(ie Query.execute()), but only the first NodeIterator.nextNode() call. 
Or am I wrong on this ?
In my case, I only consider the execution time, so the 
respectDocumentOrder has no incidence (I have tested with or without it 
and results are the same).
So in this case, 80 seconds is indeed not acceptable.

I'll file a new new issus as soon as I have finished my benchmarks, to 
be able to give some real statistics.

Regards,

-- 
Cédric Damioli
ANYWARE TECHNOLOGIES
Tel : +33 (0)5 61 00 52 90
Fax : +33 (0)5 61 00 51 46
http://www.anyware-tech.com


Re: Search performance issue

Posted by Marcel Reutegger <ma...@gmx.net>.
Cédric Damioli wrote:
> In my repository, I have a Node named 'content' under which I have an 
> arbitrary number of Node. Under each of these Nodes, I have one Node 
> named 'fr'.
> 
> My exemple query is simple: I want to get all "fr" Nodes.
> 
> 1) I executed the following query : "//content/*/fr". The result is ok 
> but the execution took more than 80s (the whole repository has more than 
> 100 000 Nodes and more than 1 000 000 properties)
 >
> 2) I executed the query "//content/*" followed by a small Java loop for 
> getting the "fr" subNode of each result. The whole thing took only a 
> couple of seconds.

I assume the result set is quite large, therefore you should disable 
document ordering on result nodes in the search configuration. per 
default result nodes are ordered in document order, which is an 
operation that is performed without information from the search index. 
That is, all information must be loaded through the persistence manager 
to arrange the result nodes in document order.

adding the following parameter in SearchIndex tag in workspace.xml will 
do the trick:
   <param name="respectDocumentOrder" value="false"/>

for more details on index configuration see also:
http://svn.apache.org/viewcvs.cgi/jackrabbit/trunk/jackrabbit/src/main/config/repository.xml?view=markup

as a quick workaround you can also append an order by clause to the 
query, this will also avoid document order on the result nodes:
//content/*/fr order by jcr:score

If you already disabled document order then 80 seconds is IMO not 
acceptable. In that case could you please file a jira issue.

> Is it the normal behaviour ? Does the query have to end with "/*" to be 
> correctly handled by Jackrabbit ?

no, the query can also end with a name test that is not *

regards
  marcel