You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Martin Perez <mp...@gmail.com> on 2006/01/03 10:28:17 UTC

Fwd: Strange search behaviour

Hi people, I hope you all a good new year.

Today I have been testing my repository and I found a strange search
behaviour that make me thought that it could be some bug on the search
algorithm.

Let's see. First of all, I'm using XMLPersistenceManager. I had one big
repository with 500..1000 documents. All those documents had the content
indexed using the available text filters. Next I created a smaller
repository with only two nodes, also with their content indexed with
available text filters.

Then I performed a xpath query over the smaller repository. Something like
this://*[@jcr:primaryType='nt:file' and
jcr:contains(@jlib:keywords,'test')]. As you see, is very simple, it
searches for a term under a keywords property. That query went fine and
returned very fast.

But the problem, is when I performed another query. Something like this:
//*[@jcr:primaryType='nt:resource' and jcr:contains(.,'test')] This query
tries to search the same term on the node binary contents. The query was
very very very slooooooow. So I decided to debug that query, and I saw that
the NodeIterator returned had over 270 nodes !!! How it can have 270 nodes
if the repository won't have more than 10? I suppose that is because the
query was done also over the first repository, but then, is the XPath query
wrong?

Thanks for your help!

Martin

Re: Strange search behaviour

Posted by Martin Perez <mp...@gmail.com>.

Thanks for the tip Marcel. I'll try it :)

On 1/3/06, Marcel Reutegger <ma...@day.com> wrote:
>
> I assume that the returned nodes are from the jcr:system tree which
> contains versions of your nodes and node representations of your
> nodetypes.
>
> Regarding performance, per default the query handler is configured to
> return nodes in document order. This means that the query handler will
> read all result nodes and order them how they appear in the workspace.
> In your case using a XMLPersistenceManager this  might not the the
> most efficient setup ;)
>
> You can disable document order on result nodes with the following
> parameter in the SearchIndex tag:
> name="respectDocumentOrder" value="false"
>
> If you still have performance problems or think that the query returns
> wrong results please post a jira issue with instructions how to
> reproduce.
>
> regards
> marcel
>
> On 1/3/06, Martin Perez <mp...@gmail.com> wrote:
> > Hi people, I hope you all a good new year.
> >
> > Today I have been testing my repository and I found a strange search
> > behaviour that make me thought that it could be some bug on the search
> > algorithm.
> >
> > Let's see. First of all, I'm using XMLPersistenceManager. I had one big
> > repository with 500..1000 documents. All those documents had the content
> > indexed using the available text filters. Next I created a smaller
> > repository with only two nodes, also with their content indexed with
> > available text filters.
> >
> > Then I performed a xpath query over the smaller repository. Something
> like
> > this://*[@jcr:primaryType='nt:file' and
> > jcr:contains(@jlib:keywords,'test')]. As you see, is very simple, it
> > searches for a term under a keywords property. That query went fine and
> > returned very fast.
> >
> > But the problem, is when I performed another query. Something like this:
> > //*[@jcr:primaryType='nt:resource' and jcr:contains(.,'test')] This
> query
> > tries to search the same term on the node binary contents. The query was
> > very very very slooooooow. So I decided to debug that query, and I saw
> that
> > the NodeIterator returned had over 270 nodes !!! How it can have 270
> nodes
> > if the repository won't have more than 10? I suppose that is because the
> > query was done also over the first repository, but then, is the XPath
> query
> > wrong?
> >
> > Thanks for your help!
> >
> > Martin
> >
> >
>

Re: Strange search behaviour

Posted by Marcel Reutegger <ma...@day.com>.

I assume that the returned nodes are from the jcr:system tree which
contains versions of your nodes and node representations of your
nodetypes.

Regarding performance, per default the query handler is configured to
return nodes in document order. This means that the query handler will
read all result nodes and order them how they appear in the workspace.
In your case using a XMLPersistenceManager this  might not the the
most efficient setup ;)

You can disable document order on result nodes with the following
parameter in the SearchIndex tag:
name="respectDocumentOrder" value="false"

If you still have performance problems or think that the query returns
wrong results please post a jira issue with instructions how to
reproduce.

regards
 marcel

On 1/3/06, Martin Perez <mp...@gmail.com> wrote:
> Hi people, I hope you all a good new year.
>
> Today I have been testing my repository and I found a strange search
> behaviour that make me thought that it could be some bug on the search
> algorithm.
>
> Let's see. First of all, I'm using XMLPersistenceManager. I had one big
> repository with 500..1000 documents. All those documents had the content
> indexed using the available text filters. Next I created a smaller
> repository with only two nodes, also with their content indexed with
> available text filters.
>
> Then I performed a xpath query over the smaller repository. Something like
> this://*[@jcr:primaryType='nt:file' and
> jcr:contains(@jlib:keywords,'test')]. As you see, is very simple, it
> searches for a term under a keywords property. That query went fine and
> returned very fast.
>
> But the problem, is when I performed another query. Something like this:
> //*[@jcr:primaryType='nt:resource' and jcr:contains(.,'test')] This query
> tries to search the same term on the node binary contents. The query was
> very very very slooooooow. So I decided to debug that query, and I saw that
> the NodeIterator returned had over 270 nodes !!! How it can have 270 nodes
> if the repository won't have more than 10? I suppose that is because the
> query was done also over the first repository, but then, is the XPath query
> wrong?
>
> Thanks for your help!
>
> Martin
>
>