You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Christoph Kiehl <ch...@sulu3000.de> on 2007/08/16 19:09:31 UTC

Total size of a query result and setLimit()

Hi,

while fixing a little bug in rev 566778 I became aware that there is no 
possibility to retrieve the total result size of a query anymore if setLimit() 
is used. But I need that information and I think I'm not alone. The question is 
how to implement this? Should this maybe even be covered by jsr 283? The method 
could be either implemented on the LazyScoreNodeIterator (RangeIterator) but may 
be it is more appropriate for LazyQueryResultImpl/QueryResultImpl (QueryResult) 
because limits are specific to querying. WDYT?

Cheers,
Christoph


Re: Total size of a query result and setLimit()

Posted by Christoph Kiehl <ch...@sulu3000.de>.
Marcel Reutegger wrote:

> Hmm, after some thinking here's another proposal:
> 
> keep the setLimit() as is. but introduce a getSize() (or 
> getTotalMatches()?) on the QueryResult. This method always returns the 
> total number of nodes/rows independent of setLimit().

That was what I was trying to suggest in my original post ;) I would go for 
getTotalSize() as Thomas suggested.

Cheers,
Christoph


Re: Total size of a query result and setLimit()

Posted by Christoph Kiehl <ch...@sulu3000.de>.
Jukka Zitting wrote:
> Hi,
> 
> On 8/17/07, Christoph Kiehl <ch...@sulu3000.de> wrote:
>> Thomas Mueller wrote:
>>> That's a good idea! Implementations that can't support it efficiently
>>> could then calculate the size only when required. What about
>>> getTotalSize()?
>> Implementations should maybe even allowed to return -1 (as on
>> RangeIterator.getSize()) if they do not support this method ...
> 
> I don't like the -1 result. As long as it's allowed, an interoperable
> client must always assume that an implementation may return -1 and
> provide a workaround for such cases.

I don't like it either, but I have absolutely no idea what kind of systems there 
are trying to implement the JCR API and if they are capable of providing a total 
result size. Until now, without setLimit(), they weren't forced to return the 
total size in RangeIterator.getSize() so I thought it might be necessary to 
provide that option for other implementors because at least one off them seems 
to need that -1 return value. But if it isn't necessary I would definitely like 
to skip that option.

> How is the total size question typically solved in cases where an
> application pages through a large database result set? I recall
> sometimes using a separate COUNT(*) query for that, but there may be
> more efficient alternatives.

That's what Thomas suggested as well and I know of no other option.

Cheers,
Christoph


Re: Total size of a query result and setLimit()

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 8/17/07, Christoph Kiehl <ch...@sulu3000.de> wrote:
> Thomas Mueller wrote:
> > That's a good idea! Implementations that can't support it efficiently
> > could then calculate the size only when required. What about
> > getTotalSize()?
>
> Implementations should maybe even allowed to return -1 (as on
> RangeIterator.getSize()) if they do not support this method ...

I don't like the -1 result. As long as it's allowed, an interoperable
client must always assume that an implementation may return -1 and
provide a workaround for such cases.

How is the total size question typically solved in cases where an
application pages through a large database result set? I recall
sometimes using a separate COUNT(*) query for that, but there may be
more efficient alternatives.

BR,

Jukka Zitting

Re: Total size of a query result and setLimit()

Posted by Christoph Kiehl <ch...@sulu3000.de>.
Thomas Mueller wrote:

>> I'm not concerned about an implementation not being able to 'support'
>> setLimit(). I rather think of applications using that new method and at the same
>> time wish to find out the total number of matches, as written initially by
>> Christoph.
> 
> I understand.
> 
>> keep the setLimit() as is. but introduce a getSize() (or getTotalMatches()?) on
>> the QueryResult. This method always returns the total number of nodes/rows
>> independent of setLimit().
> 
> That's a good idea! Implementations that can't support it efficiently
> could then calculate the size only when required. What about
> getTotalSize()?

Implementations should maybe even allowed to return -1 (as on 
RangeIterator.getSize()) if they do not support this method ...

Cheers,
Christoph


Re: Total size of a query result and setLimit()

Posted by Thomas Mueller <th...@gmail.com>.
Hi,

> I'm not concerned about an implementation not being able to 'support'
> setLimit(). I rather think of applications using that new method and at the same
> time wish to find out the total number of matches, as written initially by
> Christoph.

I understand.

> keep the setLimit() as is. but introduce a getSize() (or getTotalMatches()?) on
> the QueryResult. This method always returns the total number of nodes/rows
> independent of setLimit().

That's a good idea! Implementations that can't support it efficiently
could then calculate the size only when required. What about
getTotalSize()?

Thomas

Re: Total size of a query result and setLimit()

Posted by Marcel Reutegger <ma...@gmx.net>.
Thomas Mueller wrote:
>> as a replacement for setLimit()
> 
> I wouldn't remove setLimit(). Maybe Lucene does not have this option,
> I don't know about that. But in the database world, it speeds up
> queries. Sometimes the database can use a faster access method. When
> ordering is used, Jackrabbit could simply ignore results that are (so
> far) outside the limit, saving memory and speed up sorting.

I'm not concerned about an implementation not being able to 'support' 
setLimit(). I rather think of applications using that new method and at the same 
time wish to find out the total number of matches, as written initially by 
Christoph.

Hmm, after some thinking here's another proposal:

keep the setLimit() as is. but introduce a getSize() (or getTotalMatches()?) on 
the QueryResult. This method always returns the total number of nodes/rows 
independent of setLimit().

regards
  marcel

Re: Total size of a query result and setLimit()

Posted by Thomas Mueller <th...@gmail.com>.
Hi,

> retrieve the total result size

Google gives you only an approximation of the total result size.
Databases: if you don't want that the database engine reads the whole
result set, you need to run two queries (one to get the size).

> void setFetchSize(long size)

This would be a hint, and would not limit the number of rows the
application can read. This exists in the database world as well (as a
hint).

> as a replacement for setLimit()

I wouldn't remove setLimit(). Maybe Lucene does not have this option,
I don't know about that. But in the database world, it speeds up
queries. Sometimes the database can use a faster access method. When
ordering is used, Jackrabbit could simply ignore results that are (so
far) outside the limit, saving memory and speed up sorting.

setLimit() allows some optimizations (in JCR implementations) that are
not possible with setFetchSize().

If the application uses paging, it can and should use setOffset() and
setLimit() in my view.

Thomas

Re: Total size of a query result and setLimit()

Posted by Marcel Reutegger <ma...@gmx.net>.
Christoph Kiehl wrote:
> while fixing a little bug in rev 566778 I became aware that there is no 
> possibility to retrieve the total result size of a query anymore if 
> setLimit() is used. But I need that information and I think I'm not 
> alone. The question is how to implement this? Should this maybe even be 
> covered by jsr 283? The method could be either implemented on the 
> LazyScoreNodeIterator (RangeIterator) but may be it is more appropriate 
> for LazyQueryResultImpl/QueryResultImpl (QueryResult) because limits are 
> specific to querying. WDYT?

The public review version of JSR 283 only contains little information about the 
two methods setLimit(long) and setOffset(long). They even contain TODO remarks 
whether the methods should be removed again.

How they are specified currently they don't seem to be very useful to me. I 
think the spec (and of course also jackrabbit) should be changed the following way:

void setOffset(long offset)

Sets the start offset of the query result to _offset_. Setting an offset does 
not modify the size of the NodeIterator or RowIterator returned by the 
QueryResult. The following two code fragments behave equivalent from a client 
perspective (if there are 10 or more matching nodes):

Query q = ...
q.setOffset(10);
NodeIterator it = q.execute().getNodes();

is equivalent to:

Query q = ...
NodeIterator it = q.execute().getNodes();
it.skip(10);

The first code is considered more efficient because it allows an implementation 
to optimize access to the range of nodes in the query result a client is 
actually interested.


and as a replacement for setLimit():

void setFetchSize(long size)

Gives the QueryManager a hint as to the number of rows/nodes that should be 
fetched from the workspace when more rows/nodes are needed. The number of 
rows/nodes specified affects only QueryResults created using this Query. If the 
value specified is zero, then the hint is ignored. The default value is zero.

WDYT?

regards
  marcel