You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by Jan Grathwohl <ja...@kontrast.de> on 2008/05/28 12:19:19 UTC

NodeIterator drops nodes when sorting

Hi,

we are having the problem in our application that an XPath query does  
not return all results that should be in it. Some debugging through  
the Jackrabbit internals showed us that the NodeIterator from the  
query result received the right UUIDs from the Lucene index, but then  
removed some of them from the list because the sorting of the results  
failed.

The situation is that we have some nodes in the results list where  
not all of the node's ancestors are accessible for the Session  
(blocked by our AccessManager). We receive a DocOrderNodeIteratorImpl  
from the query that contains these nodes, and this iterator tries to  
sort the nodes before the first method call that accesses them. The  
comparator then gets an AccessDeniedException from the getAncestor()  
of one of the nodes, and removes these nodes with unaccessable  
ancestors from the node list. It also looks like the Comparator  
directly removes both compared nodes from the result list if one of  
them throws an Exception when being compared.

Is this wanted behaviour, that nodes won't be returned by a query  
when they cannot be sorted? And is it generally supported in  
JackRabbit to have nodes whose ancestors are not accessible?

We could work around that by turning off the sorting of the nodes, we  
don't need sorted query results here. Is there a way to achieve this  
trough JCR or Jackrabbit API? We are currently doing this by  
accessing the private  
org.apache.jackrabbit.core.query.lucene.QueryImpl object from the  
query result through Java reflection, and then calling a  
setRespectDocumentOrder(false) on it. But maybe there is a nicer way  
(as probably almost any way would be nicer) to achieve the same result?

I will attach the XPath query and Exception stack trace from our log  
file.

Best regards and Thanks,

Jan


15:55:53,194 DEBUG [tcmdataaccess] QueryString is: //element(*,  
tcs:category) [fn:lower-case(@tcs:defaultContentType) =  
'information'] /element(*, tcs:categorylocalization) [ @tcs:locview =  
'pngo' and @tcs:loclanguage = 'de']
15:56:41,836 ERROR [DocOrderNodeIteratorImpl] Exception while sorting  
nodes in document order: javax.jcr.AccessDeniedException: cannot read  
item 827cae10-ad2e-44ad-927f-a65e96e0d4f2
javax.jcr.AccessDeniedException: cannot read item 827cae10- 
ad2e-44ad-927f-a65e96e0d4f2
	at org.apache.jackrabbit.core.ItemManager.getItem(ItemManager.java:392)
	at org.apache.jackrabbit.core.ItemManager.getNode(ItemManager.java:350)
	at org.apache.jackrabbit.core.ItemImpl.getAncestor(ItemImpl.java:1403)
	at org.apache.jackrabbit.core.query.lucene.DocOrderNodeIteratorImpl 
$1.compare(DocOrderNodeIteratorImpl.java:220)
	at java.util.Arrays.mergeSort(Arrays.java:1284)
	at java.util.Arrays.mergeSort(Arrays.java:1295)
	at java.util.Arrays.sort(Arrays.java:1223)
	at  
org.apache.jackrabbit.core.query.lucene.DocOrderNodeIteratorImpl.initOrd 
eredIterator(DocOrderNodeIteratorImpl.java:172)
	at  
org.apache.jackrabbit.core.query.lucene.DocOrderNodeIteratorImpl.hasNext 
(DocOrderNodeIteratorImpl.java:131)
	at  
kontrast.toshiba.datastore.accessimpl.content.search.ContentSearchImpl.g 
etContentList(ContentSearchImpl.java:267)
	at  
kontrast.toshiba.datastore.accessimpl.content.search.ContentSearchImpl.f 
indContents(ContentSearchImpl.java:187)
	at  
kontrast.toshiba.datastore.accessimpl.content.search.ContentSearchImpl.p 
erformQuery(ContentSearchImpl.java:117)
	at  
kontrast.toshiba.datastore.accessimpl.content.search.ContentSearchImpl.g 
etResults(ContentSearchImpl.java:89)

RE: NodeIterator drops nodes when sorting

Posted by Ard Schrijvers <a....@onehippo.com>.

Hello,

> 
> To clarify the situation: I have a node with the path a/b/c 
> in the XPath query result, node c can be accessed, but a and 
> b cannot. But the node c will then also be removed from the 
> result list, because the document order cannot be created 
> because of its unauthorized ancestors. So I have the 
> situation that node c will be removed from the result 
> although is is an authorized and accessible node, only its 
> ancestors are not.

Think is part of *how* you implemented the AccessManager I think.

> > Which version of Jackrabbit are you using? First of all, for 
> > JackRabbit
> >> 1.5 the default setting for respectDocumentOrder will be 
> false. For <
> > 1.5 it is true. You can configure it to false/true by adding
> >
> > <param name="respectDocumentOrder" value="false"/>
> >
> > To you <SearchIndex> element in repository.xml
> 
> I use Jackrabbit 1.4. I implemented the solution the 
> Sébastien suggested, adding a "order by @jcr:score" to the 
> query. Works very well for me.

Perfect. 

Ard

> 
> Regards and Thanks,
> 
> Jan
> 
> 
>

Re: NodeIterator drops nodes when sorting

Posted by Jan Grathwohl <ja...@kontrast.de>.

Hi,

>> The situation is that we have some nodes in the results list
>> where not all of the node's ancestors are accessible for the
>> Session (blocked by our AccessManager). We receive a
>> DocOrderNodeIteratorImpl from the query that contains these
>> nodes, and this iterator tries to sort the nodes before the
>> first method call that accesses them. The comparator then
>> gets an AccessDeniedException from the getAncestor() of one
>> of the nodes, and removes these nodes with unaccessable
>
> This seems correct to me, isn't?

Is it? I really don't know, but it is a least a behaviour that I was  
not aware of.

To clarify the situation: I have a node with the path a/b/c in the  
XPath query result, node c can be accessed, but a and b cannot. But  
the node c will then also be removed from the result list, because  
the document order cannot be created because of its unauthorized  
ancestors. So I have the situation that node c will be removed from  
the result although is is an authorized and accessible node, only its  
ancestors are not.

If I think it through, It is somehow logical in itself: default for  
missing "order by" specification is to create document order ->  
document order cannot be created if ancestors are not accessible ->  
node is not in the result. But it is a kind of pitfall when you're  
not aware of it.

>> ancestors from the node list. It also looks like the
>> Comparator directly removes both compared nodes from the
>> result list if one of them throws an Exception when being compared.
>
> And this is the actual error/problem, isn't? If correct, only the node
> that cannot be accessed should be removed, right?

Yes, I would also consider that to be wrong, it should not behave  
like that.

> Which version of Jackrabbit are you using? First of all, for  
> JackRabbit
>> 1.5 the default setting for respectDocumentOrder will be false. For <
> 1.5 it is true. You can configure it to false/true by adding
>
> <param name="respectDocumentOrder" value="false"/>
>
> To you <SearchIndex> element in repository.xml

I use Jackrabbit 1.4. I implemented the solution the Sébastien  
suggested, adding a "order by @jcr:score" to the query. Works very  
well for me.

Regards and Thanks,

Jan

RE: NodeIterator drops nodes when sorting

Posted by Ard Schrijvers <a....@onehippo.com>.

Hello,

> The situation is that we have some nodes in the results list 
> where not all of the node's ancestors are accessible for the 
> Session (blocked by our AccessManager). We receive a 
> DocOrderNodeIteratorImpl from the query that contains these 
> nodes, and this iterator tries to sort the nodes before the 
> first method call that accesses them. The comparator then 
> gets an AccessDeniedException from the getAncestor() of one 
> of the nodes, and removes these nodes with unaccessable 

This seems correct to me, isn't?

> ancestors from the node list. It also looks like the 
> Comparator directly removes both compared nodes from the 
> result list if one of them throws an Exception when being compared.

And this is the actual error/problem, isn't? If correct, only the node
that cannot be accessed should be removed, right?

> 
> Is this wanted behaviour, that nodes won't be returned by a 
> query when they cannot be sorted? And is it generally 
> supported in JackRabbit to have nodes whose ancestors are not 
> accessible?
> 
> We could work around that by turning off the sorting of the 
> nodes, we don't need sorted query results here. Is there a 
> way to achieve this trough JCR or Jackrabbit API? We are 
> currently doing this by accessing the private 
> org.apache.jackrabbit.core.query.lucene.QueryImpl object from 
> the query result through Java reflection, and then calling a
> setRespectDocumentOrder(false) on it. But maybe there is a 
> nicer way (as probably almost any way would be nicer) to 
> achieve the same result?

Which version of Jackrabbit are you using? First of all, for JackRabbit
> 1.5 the default setting for respectDocumentOrder will be false. For <
1.5 it is true. You can configure it to false/true by adding 

<param name="respectDocumentOrder" value="false"/>

To you <SearchIndex> element in repository.xml

Regards Ard

> 
> I will attach the XPath query and Exception stack trace from 
> our log file.
> 
> Best regards and Thanks,
> 
> Jan
> 
>

Re: NodeIterator drops nodes when sorting

Posted by Jan Grathwohl <ja...@kontrast.de>.

Hi Sébastien,

"order by @jcr:score" is exactly what I was looking for.

Thank you very much.

Jan


Am 28.05.2008 um 20:12 schrieb Sébastien Launay:

> Hi Jan,
>
> By setting the JavaBean property respectDocumentOrder to false
> on SearchIndex you can disable document order sorting of query  
> results :
> <SearchIndex  
> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>  ...
>  <param name="respectDocumentOrder" value="false"/>
>  ...
> </SearchIndex>
>
> Indeed, QueryImpl instances will be injected with this property.
> This can also be forced using "order by @jcr:score" in the query.
>
> But this does not changed the fact that unauthorized nodes will
> not be retrieved from the query and will certainly log warnings
> (from what i have seen AccessDeniedException is not considered
> differently than a RepositoryException).
> Another important behavior is that NodeIterator#getSize()  
> implementation
> may (and will in your case) decrement while iterating over the nodes.
>
> I think this is needed to get the best performance.
>
> Jan Grathwohl wrote:
>> Hi,
>>
>> we are having the problem in our application that an XPath query  
>> does not return all results that should be in it. Some debugging  
>> through the Jackrabbit internals showed us that the NodeIterator  
>> from the query result received the right UUIDs from the Lucene  
>> index, but then removed some of them from the list because the  
>> sorting of the results failed.
>>
>> The situation is that we have some nodes in the results list where  
>> not all of the node's ancestors are accessible for the Session  
>> (blocked by our AccessManager). We receive a  
>> DocOrderNodeIteratorImpl from the query that contains these nodes,  
>> and this iterator tries to sort the nodes before the first method  
>> call that accesses them. The comparator then gets an  
>> AccessDeniedException from the getAncestor() of one of the nodes,  
>> and removes these nodes with unaccessable ancestors from the node  
>> list. It also looks like the Comparator directly removes both  
>> compared nodes from the result list if one of them throws an  
>> Exception when being compared.
>>
>> Is this wanted behaviour, that nodes won't be returned by a query  
>> when they cannot be sorted? And is it generally supported in  
>> JackRabbit to have nodes whose ancestors are not accessible?
>>
>> We could work around that by turning off the sorting of the nodes,  
>> we don't need sorted query results here. Is there a way to achieve  
>> this trough JCR or Jackrabbit API? We are currently doing this by  
>> accessing the private  
>> org.apache.jackrabbit.core.query.lucene.QueryImpl object from the  
>> query result through Java reflection, and then calling a  
>> setRespectDocumentOrder(false) on it. But maybe there is a nicer  
>> way (as probably almost any way would be nicer) to achieve the  
>> same result?
>>
>> I will attach the XPath query and Exception stack trace from our  
>> log file.
>>
>> Best regards and Thanks,
>>
>> Jan
>>
>>
>> 15:55:53,194 DEBUG [tcmdataaccess] QueryString is: //element(*,  
>> tcs:category) [fn:lower-case(@tcs:defaultContentType) =  
>> 'information'] /element(*, tcs:categorylocalization)  
>> [ @tcs:locview = 'pngo' and @tcs:loclanguage = 'de']
>> 15:56:41,836 ERROR [DocOrderNodeIteratorImpl] Exception while  
>> sorting nodes in document order: javax.jcr.AccessDeniedException:  
>> cannot read item 827cae10-ad2e-44ad-927f-a65e96e0d4f2
>> javax.jcr.AccessDeniedException: cannot read item 827cae10- 
>> ad2e-44ad-927f-a65e96e0d4f2
>>     at org.apache.jackrabbit.core.ItemManager.getItem 
>> (ItemManager.java:392)
>>     at org.apache.jackrabbit.core.ItemManager.getNode 
>> (ItemManager.java:350)
>>     at org.apache.jackrabbit.core.ItemImpl.getAncestor 
>> (ItemImpl.java:1403)
>>     at  
>> org.apache.jackrabbit.core.query.lucene.DocOrderNodeIteratorImpl 
>> $1.compare(DocOrderNodeIteratorImpl.java:220)
>>     at java.util.Arrays.mergeSort(Arrays.java:1284)
>>     at java.util.Arrays.mergeSort(Arrays.java:1295)
>>     at java.util.Arrays.sort(Arrays.java:1223)
>>     at  
>> org.apache.jackrabbit.core.query.lucene.DocOrderNodeIteratorImpl.init 
>> OrderedIterator(DocOrderNodeIteratorImpl.java:172)
>>     at  
>> org.apache.jackrabbit.core.query.lucene.DocOrderNodeIteratorImpl.hasN 
>> ext(DocOrderNodeIteratorImpl.java:131)
>>     at  
>> kontrast.toshiba.datastore.accessimpl.content.search.ContentSearchImp 
>> l.getContentList(ContentSearchImpl.java:267)
>>     at  
>> kontrast.toshiba.datastore.accessimpl.content.search.ContentSearchImp 
>> l.findContents(ContentSearchImpl.java:187)
>>     at  
>> kontrast.toshiba.datastore.accessimpl.content.search.ContentSearchImp 
>> l.performQuery(ContentSearchImpl.java:117)
>>     at  
>> kontrast.toshiba.datastore.accessimpl.content.search.ContentSearchImp 
>> l.getResults(ContentSearchImpl.java:89)
>

Re: NodeIterator drops nodes when sorting

Posted by Sébastien Launay <se...@anyware-tech.com>.

Hi Jan,

By setting the JavaBean property respectDocumentOrder to false
on SearchIndex you can disable document order sorting of query results :
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
  ...
  <param name="respectDocumentOrder" value="false"/>
  ...
</SearchIndex>

Indeed, QueryImpl instances will be injected with this property.
This can also be forced using "order by @jcr:score" in the query.

But this does not changed the fact that unauthorized nodes will
not be retrieved from the query and will certainly log warnings
(from what i have seen AccessDeniedException is not considered
differently than a RepositoryException).
Another important behavior is that NodeIterator#getSize() implementation
may (and will in your case) decrement while iterating over the nodes.

I think this is needed to get the best performance.

Jan Grathwohl wrote:
> Hi,
>
> we are having the problem in our application that an XPath query does 
> not return all results that should be in it. Some debugging through 
> the Jackrabbit internals showed us that the NodeIterator from the 
> query result received the right UUIDs from the Lucene index, but then 
> removed some of them from the list because the sorting of the results 
> failed.
>
> The situation is that we have some nodes in the results list where not 
> all of the node's ancestors are accessible for the Session (blocked by 
> our AccessManager). We receive a DocOrderNodeIteratorImpl from the 
> query that contains these nodes, and this iterator tries to sort the 
> nodes before the first method call that accesses them. The comparator 
> then gets an AccessDeniedException from the getAncestor() of one of 
> the nodes, and removes these nodes with unaccessable ancestors from 
> the node list. It also looks like the Comparator directly removes both 
> compared nodes from the result list if one of them throws an Exception 
> when being compared.
>
> Is this wanted behaviour, that nodes won't be returned by a query when 
> they cannot be sorted? And is it generally supported in JackRabbit to 
> have nodes whose ancestors are not accessible?
>
> We could work around that by turning off the sorting of the nodes, we 
> don't need sorted query results here. Is there a way to achieve this 
> trough JCR or Jackrabbit API? We are currently doing this by accessing 
> the private org.apache.jackrabbit.core.query.lucene.QueryImpl object 
> from the query result through Java reflection, and then calling a 
> setRespectDocumentOrder(false) on it. But maybe there is a nicer way 
> (as probably almost any way would be nicer) to achieve the same result?
>
> I will attach the XPath query and Exception stack trace from our log 
> file.
>
> Best regards and Thanks,
>
> Jan
>
>
> 15:55:53,194 DEBUG [tcmdataaccess] QueryString is: //element(*, 
> tcs:category) [fn:lower-case(@tcs:defaultContentType) = 'information'] 
> /element(*, tcs:categorylocalization) [ @tcs:locview = 'pngo' and 
> @tcs:loclanguage = 'de']
> 15:56:41,836 ERROR [DocOrderNodeIteratorImpl] Exception while sorting 
> nodes in document order: javax.jcr.AccessDeniedException: cannot read 
> item 827cae10-ad2e-44ad-927f-a65e96e0d4f2
> javax.jcr.AccessDeniedException: cannot read item 
> 827cae10-ad2e-44ad-927f-a65e96e0d4f2
>     at 
> org.apache.jackrabbit.core.ItemManager.getItem(ItemManager.java:392)
>     at 
> org.apache.jackrabbit.core.ItemManager.getNode(ItemManager.java:350)
>     at 
> org.apache.jackrabbit.core.ItemImpl.getAncestor(ItemImpl.java:1403)
>     at 
> org.apache.jackrabbit.core.query.lucene.DocOrderNodeIteratorImpl$1.compare(DocOrderNodeIteratorImpl.java:220) 
>
>     at java.util.Arrays.mergeSort(Arrays.java:1284)
>     at java.util.Arrays.mergeSort(Arrays.java:1295)
>     at java.util.Arrays.sort(Arrays.java:1223)
>     at 
> org.apache.jackrabbit.core.query.lucene.DocOrderNodeIteratorImpl.initOrderedIterator(DocOrderNodeIteratorImpl.java:172) 
>
>     at 
> org.apache.jackrabbit.core.query.lucene.DocOrderNodeIteratorImpl.hasNext(DocOrderNodeIteratorImpl.java:131) 
>
>     at 
> kontrast.toshiba.datastore.accessimpl.content.search.ContentSearchImpl.getContentList(ContentSearchImpl.java:267) 
>
>     at 
> kontrast.toshiba.datastore.accessimpl.content.search.ContentSearchImpl.findContents(ContentSearchImpl.java:187) 
>
>     at 
> kontrast.toshiba.datastore.accessimpl.content.search.ContentSearchImpl.performQuery(ContentSearchImpl.java:117) 
>
>     at 
> kontrast.toshiba.datastore.accessimpl.content.search.ContentSearchImpl.getResults(ContentSearchImpl.java:89) 
>