You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Cédric Damioli <ce...@anyware-tech.com> on 2006/03/30 23:17:15 UTC
Search performance issue
Hi all,
In my repository, I have a Node named 'content' under which I have an
arbitrary number of Node. Under each of these Nodes, I have one Node
named 'fr'.
My exemple query is simple: I want to get all "fr" Nodes.
1) I executed the following query : "//content/*/fr". The result is ok
but the execution took more than 80s (the whole repository has more than
100 000 Nodes and more than 1 000 000 properties)
2) I executed the query "//content/*" followed by a small Java loop for
getting the "fr" subNode of each result. The whole thing took only a
couple of seconds.
Is it the normal behaviour ? Does the query have to end with "/*" to be
correctly handled by Jackrabbit ?
Regards,
--
Cédric Damioli
ANYWARE TECHNOLOGIES
Tel : +33 (0)5 61 00 52 90
Fax : +33 (0)5 61 00 51 46
http://www.anyware-tech.com
Re: Search performance issue
Posted by Marcel Reutegger <ma...@gmx.net>.
Cédric Damioli wrote:
> IIUC, the document order flag does not affect the query execution time
> (ie Query.execute()), but only the first NodeIterator.nextNode() call.
> Or am I wrong on this ?
no, this is correct.
execution of the query is the same no matter if document order is
enabled or not. document order is established (if needed) on the result
iterator when it is first accessed. exactly as you described.
> In my case, I only consider the execution time, so the
> respectDocumentOrder has no incidence (I have tested with or without it
> and results are the same).
> So in this case, 80 seconds is indeed not acceptable.
>
> I'll file a new new issus as soon as I have finished my benchmarks, to
> be able to give some real statistics.
ok, thanks. can you then please provide information on the structure of
your content? I don't need the exact data, just the rough structure and
amount of content.
regards
marcel
Re: Search performance issue
Posted by Cédric Damioli <ce...@anyware-tech.com>.
Marcel Reutegger a écrit :
> Cédric Damioli wrote:
>> In my repository, I have a Node named 'content' under which I have an
>> arbitrary number of Node. Under each of these Nodes, I have one Node
>> named 'fr'.
>>
>> My exemple query is simple: I want to get all "fr" Nodes.
>>
>> 1) I executed the following query : "//content/*/fr". The result is
>> ok but the execution took more than 80s (the whole repository has
>> more than 100 000 Nodes and more than 1 000 000 properties)
> >
>> 2) I executed the query "//content/*" followed by a small Java loop
>> for getting the "fr" subNode of each result. The whole thing took
>> only a couple of seconds.
>
> I assume the result set is quite large, therefore you should disable
> document ordering on result nodes in the search configuration. per
> default result nodes are ordered in document order, which is an
> operation that is performed without information from the search index.
> That is, all information must be loaded through the persistence
> manager to arrange the result nodes in document order.
>
> adding the following parameter in SearchIndex tag in workspace.xml
> will do the trick:
> <param name="respectDocumentOrder" value="false"/>
>
> for more details on index configuration see also:
> http://svn.apache.org/viewcvs.cgi/jackrabbit/trunk/jackrabbit/src/main/config/repository.xml?view=markup
>
>
> as a quick workaround you can also append an order by clause to the
> query, this will also avoid document order on the result nodes:
> //content/*/fr order by jcr:score
>
> If you already disabled document order then 80 seconds is IMO not
> acceptable. In that case could you please file a jira issue.
>
IIUC, the document order flag does not affect the query execution time
(ie Query.execute()), but only the first NodeIterator.nextNode() call.
Or am I wrong on this ?
In my case, I only consider the execution time, so the
respectDocumentOrder has no incidence (I have tested with or without it
and results are the same).
So in this case, 80 seconds is indeed not acceptable.
I'll file a new new issus as soon as I have finished my benchmarks, to
be able to give some real statistics.
Regards,
--
Cédric Damioli
ANYWARE TECHNOLOGIES
Tel : +33 (0)5 61 00 52 90
Fax : +33 (0)5 61 00 51 46
http://www.anyware-tech.com
Re: Search performance issue
Posted by Marcel Reutegger <ma...@gmx.net>.
Cédric Damioli wrote:
> In my repository, I have a Node named 'content' under which I have an
> arbitrary number of Node. Under each of these Nodes, I have one Node
> named 'fr'.
>
> My exemple query is simple: I want to get all "fr" Nodes.
>
> 1) I executed the following query : "//content/*/fr". The result is ok
> but the execution took more than 80s (the whole repository has more than
> 100 000 Nodes and more than 1 000 000 properties)
>
> 2) I executed the query "//content/*" followed by a small Java loop for
> getting the "fr" subNode of each result. The whole thing took only a
> couple of seconds.
I assume the result set is quite large, therefore you should disable
document ordering on result nodes in the search configuration. per
default result nodes are ordered in document order, which is an
operation that is performed without information from the search index.
That is, all information must be loaded through the persistence manager
to arrange the result nodes in document order.
adding the following parameter in SearchIndex tag in workspace.xml will
do the trick:
<param name="respectDocumentOrder" value="false"/>
for more details on index configuration see also:
http://svn.apache.org/viewcvs.cgi/jackrabbit/trunk/jackrabbit/src/main/config/repository.xml?view=markup
as a quick workaround you can also append an order by clause to the
query, this will also avoid document order on the result nodes:
//content/*/fr order by jcr:score
If you already disabled document order then 80 seconds is IMO not
acceptable. In that case could you please file a jira issue.
> Is it the normal behaviour ? Does the query have to end with "/*" to be
> correctly handled by Jackrabbit ?
no, the query can also end with a name test that is not *
regards
marcel