You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by David Caruana <da...@alfresco.org> on 2005/09/22 18:15:32 UTC

QueryResult Clarification

I have some questions regarding QueryResult.getRows() and QueryResult.getNodes().

>From a brief scan of the Jackrabbit implementation, I believe it is possible to repeatedly call each of the above methods on a given QueryResult instance and in each case receive a new RowIterator or NodeIterator.  From reading the JCR spec., I would assume this behaviour too but it is not explicit.

However, from an implementation perspective, doesn't this mean that either the complete result set has to be kept somewhere (memory/disk..) or the query re-executed for each call to getRows or getNodes.  I think JackRabbit holds it in memory; has this been an issue for large result sets?

I only ask as I'm currently implementing the QueryManager façade onto our own repository implementation and can go either way - or is the spec. more flexible? meaning that getRows()/Nodes() can return the same Iterator, and Query.Execute has to called again to get a new Iterator.

Regards,
David Caruana
www.alfresco.org

Re: QueryResult Clarification

Posted by Marcel Reutegger <ma...@gmx.net>.

Hi David,

David Caruana wrote:
>> From a brief scan of the Jackrabbit implementation, I believe it is
>> possible to repeatedly call each of the above methods on a given
>> QueryResult instance and in each case receive a new RowIterator or
>> NodeIterator.  From reading the JCR spec., I would assume this
>> behaviour too but it is not explicit.

The spec does not explicitly say that you get a new iterator whenever 
you call getNodes() but I think it is in line with the general pattern 
how iterators are used / created. E.g. the javadoc for 
Collections.iterator() also just says 'Returns an iterator ...' instead 
of 'Returns a new iterator ...'.

> However, from an implementation perspective, doesn't this mean that
> either the complete result set has to be kept somewhere
> (memory/disk..) or the query re-executed for each call to getRows or
> getNodes.  I think JackRabbit holds it in memory; has this been an
> issue for large result sets?

Jackrabbit keeps the UUIDs of the result nodes in memory, so far this 
did not pose any problem, though I only tested it with a couple of 
thousends of result nodes. The actual Node instances are only created 
when needed.

> I only ask as I'm currently implementing the QueryManager façade onto
> our own repository implementation and can go either way - or is the
> spec. more flexible? meaning that getRows()/Nodes() can return the
> same Iterator, and Query.Execute has to called again to get a new
> Iterator.

I personally think that the spec is not flexible in that respect. I 
would suggest that the query is executed again inside getRows()/Nodes() 
in case the result is too large. At least that's what I would do in 
Jackrabbit if the current implementation should hit its limits. If you 
keep a reference to the Query in QueryResult this should be possible 
quite easily.

regards
  marcel