You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Christoph Kiehl <ch...@sulu3000.de> on 2007/08/16 19:09:31 UTC
Total size of a query result and setLimit()
Hi,
while fixing a little bug in rev 566778 I became aware that there is no
possibility to retrieve the total result size of a query anymore if setLimit()
is used. But I need that information and I think I'm not alone. The question is
how to implement this? Should this maybe even be covered by jsr 283? The method
could be either implemented on the LazyScoreNodeIterator (RangeIterator) but may
be it is more appropriate for LazyQueryResultImpl/QueryResultImpl (QueryResult)
because limits are specific to querying. WDYT?
Cheers,
Christoph
Re: Total size of a query result and setLimit()
Posted by Christoph Kiehl <ch...@sulu3000.de>.
Marcel Reutegger wrote:
> Hmm, after some thinking here's another proposal:
>
> keep the setLimit() as is. but introduce a getSize() (or
> getTotalMatches()?) on the QueryResult. This method always returns the
> total number of nodes/rows independent of setLimit().
That was what I was trying to suggest in my original post ;) I would go for
getTotalSize() as Thomas suggested.
Cheers,
Christoph
Re: Total size of a query result and setLimit()
Posted by Christoph Kiehl <ch...@sulu3000.de>.
Jukka Zitting wrote:
> Hi,
>
> On 8/17/07, Christoph Kiehl <ch...@sulu3000.de> wrote:
>> Thomas Mueller wrote:
>>> That's a good idea! Implementations that can't support it efficiently
>>> could then calculate the size only when required. What about
>>> getTotalSize()?
>> Implementations should maybe even allowed to return -1 (as on
>> RangeIterator.getSize()) if they do not support this method ...
>
> I don't like the -1 result. As long as it's allowed, an interoperable
> client must always assume that an implementation may return -1 and
> provide a workaround for such cases.
I don't like it either, but I have absolutely no idea what kind of systems there
are trying to implement the JCR API and if they are capable of providing a total
result size. Until now, without setLimit(), they weren't forced to return the
total size in RangeIterator.getSize() so I thought it might be necessary to
provide that option for other implementors because at least one off them seems
to need that -1 return value. But if it isn't necessary I would definitely like
to skip that option.
> How is the total size question typically solved in cases where an
> application pages through a large database result set? I recall
> sometimes using a separate COUNT(*) query for that, but there may be
> more efficient alternatives.
That's what Thomas suggested as well and I know of no other option.
Cheers,
Christoph
Re: Total size of a query result and setLimit()
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
On 8/17/07, Christoph Kiehl <ch...@sulu3000.de> wrote:
> Thomas Mueller wrote:
> > That's a good idea! Implementations that can't support it efficiently
> > could then calculate the size only when required. What about
> > getTotalSize()?
>
> Implementations should maybe even allowed to return -1 (as on
> RangeIterator.getSize()) if they do not support this method ...
I don't like the -1 result. As long as it's allowed, an interoperable
client must always assume that an implementation may return -1 and
provide a workaround for such cases.
How is the total size question typically solved in cases where an
application pages through a large database result set? I recall
sometimes using a separate COUNT(*) query for that, but there may be
more efficient alternatives.
BR,
Jukka Zitting
Re: Total size of a query result and setLimit()
Posted by Christoph Kiehl <ch...@sulu3000.de>.
Thomas Mueller wrote:
>> I'm not concerned about an implementation not being able to 'support'
>> setLimit(). I rather think of applications using that new method and at the same
>> time wish to find out the total number of matches, as written initially by
>> Christoph.
>
> I understand.
>
>> keep the setLimit() as is. but introduce a getSize() (or getTotalMatches()?) on
>> the QueryResult. This method always returns the total number of nodes/rows
>> independent of setLimit().
>
> That's a good idea! Implementations that can't support it efficiently
> could then calculate the size only when required. What about
> getTotalSize()?
Implementations should maybe even allowed to return -1 (as on
RangeIterator.getSize()) if they do not support this method ...
Cheers,
Christoph
Re: Total size of a query result and setLimit()
Posted by Thomas Mueller <th...@gmail.com>.
Hi,
> I'm not concerned about an implementation not being able to 'support'
> setLimit(). I rather think of applications using that new method and at the same
> time wish to find out the total number of matches, as written initially by
> Christoph.
I understand.
> keep the setLimit() as is. but introduce a getSize() (or getTotalMatches()?) on
> the QueryResult. This method always returns the total number of nodes/rows
> independent of setLimit().
That's a good idea! Implementations that can't support it efficiently
could then calculate the size only when required. What about
getTotalSize()?
Thomas
Re: Total size of a query result and setLimit()
Posted by Marcel Reutegger <ma...@gmx.net>.
Thomas Mueller wrote:
>> as a replacement for setLimit()
>
> I wouldn't remove setLimit(). Maybe Lucene does not have this option,
> I don't know about that. But in the database world, it speeds up
> queries. Sometimes the database can use a faster access method. When
> ordering is used, Jackrabbit could simply ignore results that are (so
> far) outside the limit, saving memory and speed up sorting.
I'm not concerned about an implementation not being able to 'support'
setLimit(). I rather think of applications using that new method and at the same
time wish to find out the total number of matches, as written initially by
Christoph.
Hmm, after some thinking here's another proposal:
keep the setLimit() as is. but introduce a getSize() (or getTotalMatches()?) on
the QueryResult. This method always returns the total number of nodes/rows
independent of setLimit().
regards
marcel
Re: Total size of a query result and setLimit()
Posted by Thomas Mueller <th...@gmail.com>.
Hi,
> retrieve the total result size
Google gives you only an approximation of the total result size.
Databases: if you don't want that the database engine reads the whole
result set, you need to run two queries (one to get the size).
> void setFetchSize(long size)
This would be a hint, and would not limit the number of rows the
application can read. This exists in the database world as well (as a
hint).
> as a replacement for setLimit()
I wouldn't remove setLimit(). Maybe Lucene does not have this option,
I don't know about that. But in the database world, it speeds up
queries. Sometimes the database can use a faster access method. When
ordering is used, Jackrabbit could simply ignore results that are (so
far) outside the limit, saving memory and speed up sorting.
setLimit() allows some optimizations (in JCR implementations) that are
not possible with setFetchSize().
If the application uses paging, it can and should use setOffset() and
setLimit() in my view.
Thomas
Re: Total size of a query result and setLimit()
Posted by Marcel Reutegger <ma...@gmx.net>.
Christoph Kiehl wrote:
> while fixing a little bug in rev 566778 I became aware that there is no
> possibility to retrieve the total result size of a query anymore if
> setLimit() is used. But I need that information and I think I'm not
> alone. The question is how to implement this? Should this maybe even be
> covered by jsr 283? The method could be either implemented on the
> LazyScoreNodeIterator (RangeIterator) but may be it is more appropriate
> for LazyQueryResultImpl/QueryResultImpl (QueryResult) because limits are
> specific to querying. WDYT?
The public review version of JSR 283 only contains little information about the
two methods setLimit(long) and setOffset(long). They even contain TODO remarks
whether the methods should be removed again.
How they are specified currently they don't seem to be very useful to me. I
think the spec (and of course also jackrabbit) should be changed the following way:
void setOffset(long offset)
Sets the start offset of the query result to _offset_. Setting an offset does
not modify the size of the NodeIterator or RowIterator returned by the
QueryResult. The following two code fragments behave equivalent from a client
perspective (if there are 10 or more matching nodes):
Query q = ...
q.setOffset(10);
NodeIterator it = q.execute().getNodes();
is equivalent to:
Query q = ...
NodeIterator it = q.execute().getNodes();
it.skip(10);
The first code is considered more efficient because it allows an implementation
to optimize access to the range of nodes in the query result a client is
actually interested.
and as a replacement for setLimit():
void setFetchSize(long size)
Gives the QueryManager a hint as to the number of rows/nodes that should be
fetched from the workspace when more rows/nodes are needed. The number of
rows/nodes specified affects only QueryResults created using this Query. If the
value specified is zero, then the hint is ignored. The default value is zero.
WDYT?
regards
marcel