You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by François-Paul Servant <fr...@gmail.com> on 2015/11/21 01:52:06 UTC

ResultSet and pagination

Hi,

I’m using TDB. No update.

and something like:
ResultSet getResultSet(String queryString) {
	Query query = QueryFactory.create(queryString) ;
	QueryExecution qexec = QueryExecutionFactory.create(query, model))
	return qexec.execSelect() ;
}

Is there a way to be guaranteed that, if I call getResultSet twice with the same string, the iterator over the results returns them in the same order?

I don’t want to use ORDER BY in the query because it implies to go through all the results, and it is therefore slow if there are many results.

Andy, you wrote once here:
http://mail-archives.apache.org/mod_mbox/jena-users/201310.mbox/%3C5266E403.1090601@apache.org%3E
something that make me hopes that the order is generally stable, but I could be misunderstanding what you said:

> You then… have stabilized the results 
> against updates or against execution to give a different order (*rather 
> unlikely in Jena

did you mean that it is unlikely in Jena that the results be returned in a different order when the query string is the same? If the order were stable 99% of the times, I would happily avoid to go through all the results and sort them: very often, people only look at the first page of results. If the query returns 1000 solutions, it is really a waste of time to list them all.

TIA

fps

Re: ResultSet and pagination

Posted by François-Paul Servant <fr...@gmail.com>.

OK, I should have said: in my use-case, this is a competitive advantage (small dataset : 50 million triples, no natural order for the results, no ranking of them, and when we are in the 1000 solutions case, possibility to return to the user hints about how to refine the query, without reading the 1000 solutions. And practical evidences - time measurements - that things are much faster when I don’t read the 1000-n solutions)

fps

> Le 23 nov. 2015 à 09:55, james anderson <ja...@dydra.com> a écrit :
> 
> good morning;
> 
>> On 2015-11-23, at 09:31, François-Paul Servant <fr...@gmail.com> wrote:
>> 
>>>> Le 21 nov. 2015 à 17:56, Andy Seaborne <an...@apache.org> a écrit :
>>>> 
>>>> Hi,
>>>> 
>>>> In the current implementation, then results will be in the same order. Well, as far as I know.  I can't think of anything that will disturb it.  I think you realise the responsibility is yours; it is not guaranteed for all time.
>> 
>> on the other hand, a triple store implementation that would proudly claim that it ensures the stability of the order of iterator results would have a competitive advantage over those that do not … ;-)
> 
> on one hand, given the application described in an earlier message, the case remains to be made that the particular first-n-of-1000 matters. were it to matter, the order would be a stipulated criteria rather than that contingent on the storage architecture and/or query implementation and an order operation would be necessary.
> 
> on the other, the dimensions of advantage remain to be defined. depending on the actual query, its implementation could involve parallel operations which yield non-deterministic result orders. in those cases, if the execution time is the dominant dimension of advantage, a stable, unsorted result would lose the competition.
> 
> best regards, from berlin,
> ---
> james anderson | james@dydra.com | http://dydra.com
> 
> 
> 
> 
>

Re: ResultSet and pagination

Posted by james anderson <ja...@dydra.com>.

good morning;

> On 2015-11-23, at 09:31, François-Paul Servant <fr...@gmail.com> wrote:
> 
>>> Le 21 nov. 2015 à 17:56, Andy Seaborne <an...@apache.org> a écrit :
>>> 
>>> Hi,
>>> 
>>> In the current implementation, then results will be in the same order. Well, as far as I know.  I can't think of anything that will disturb it.  I think you realise the responsibility is yours; it is not guaranteed for all time.
> 
> on the other hand, a triple store implementation that would proudly claim that it ensures the stability of the order of iterator results would have a competitive advantage over those that do not … ;-)

on one hand, given the application described in an earlier message, the case remains to be made that the particular first-n-of-1000 matters. were it to matter, the order would be a stipulated criteria rather than that contingent on the storage architecture and/or query implementation and an order operation would be necessary.

on the other, the dimensions of advantage remain to be defined. depending on the actual query, its implementation could involve parallel operations which yield non-deterministic result orders. in those cases, if the execution time is the dominant dimension of advantage, a stable, unsorted result would lose the competition.

best regards, from berlin,
---
james anderson | james@dydra.com | http://dydra.com

Re: ResultSet and pagination

Posted by François-Paul Servant <fr...@gmail.com>.

>> Le 21 nov. 2015 à 17:56, Andy Seaborne <an...@apache.org> a écrit :
>> 
>> Hi,
>> 
>> In the current implementation, then results will be in the same order. Well, as far as I know.  I can't think of anything that will disturb it.  I think you realise the responsibility is yours; it is not guaranteed for all time.

on the other hand, a triple store implementation that would proudly claim that it ensures the stability of the order of iterator results would have a competitive advantage over those that do not … ;-)

Thanks again. Best Regards,

fps

Re: ResultSet and pagination

Posted by François-Paul Servant <fr...@gmail.com>.

Hi Andy,

thank you very much.
Yes, my use case is a webapp. And yes, the responsibility is mine if I rely on the order of iterator results…

fps

> Le 21 nov. 2015 à 17:56, Andy Seaborne <an...@apache.org> a écrit :
> 
> Hi,
> 
> In the current implementation, then results will be in the same order. Well, as far as I know.  I can't think of anything that will disturb it.  I think you realise the responsibility is yours; it is not guaranteed for all time.
> 
> Results are normally streaming and always truncate-able.  What might work for you is display the top page from the iterator but keep the iterator around to read more (depends on the usage - hard to do in a webapp where it is separate HTTP requests).  It's better if you can close the result set explicitly if truncating the results.
> 
> By the way "ORDER-BY/LIMIT N" is optimized to avoid a n log n sort.  It is still a traversal of the result set and isn't streaming.
> 
> 	Andy
> 
> On 21/11/15 00:52, François-Paul Servant wrote:
>> Hi,
>> 
>> I’m using TDB. No update.
>> 
>> and something like:
>> ResultSet getResultSet(String queryString) {
>> 	Query query = QueryFactory.create(queryString) ;
>> 	QueryExecution qexec = QueryExecutionFactory.create(query, model))
>> 	return qexec.execSelect() ;
>> }
>> 
>> Is there a way to be guaranteed that, if I call getResultSet twice with the same string, the iterator over the results returns them in the same order?
>> 
>> I don’t want to use ORDER BY in the query because it implies to go through all the results, and it is therefore slow if there are many results.
>> 
>> Andy, you wrote once here:
>> http://mail-archives.apache.org/mod_mbox/jena-users/201310.mbox/%3C5266E403.1090601@apache.org%3E
>> something that make me hopes that the order is generally stable, but I could be misunderstanding what you said:
>> 
>>> You then… have stabilized the results
>>> against updates or against execution to give a different order (*rather
>>> unlikely in Jena
>> 
>> did you mean that it is unlikely in Jena that the results be returned in a different order when the query string is the same? If the order were stable 99% of the times, I would happily avoid to go through all the results and sort them: very often, people only look at the first page of results. If the query returns 1000 solutions, it is really a waste of time to list them all.
>> 
>> TIA
>> 
>> fps
>> 
>

Re: ResultSet and pagination

Posted by Andy Seaborne <an...@apache.org>.

Hi,

In the current implementation, then results will be in the same order. 
Well, as far as I know.  I can't think of anything that will disturb it. 
  I think you realise the responsibility is yours; it is not guaranteed 
for all time.

Results are normally streaming and always truncate-able.  What might 
work for you is display the top page from the iterator but keep the 
iterator around to read more (depends on the usage - hard to do in a 
webapp where it is separate HTTP requests).  It's better if you can 
close the result set explicitly if truncating the results.

By the way "ORDER-BY/LIMIT N" is optimized to avoid a n log n sort.  It 
is still a traversal of the result set and isn't streaming.

	Andy

On 21/11/15 00:52, François-Paul Servant wrote:
> Hi,
>
> I’m using TDB. No update.
>
> and something like:
> ResultSet getResultSet(String queryString) {
> 	Query query = QueryFactory.create(queryString) ;
> 	QueryExecution qexec = QueryExecutionFactory.create(query, model))
> 	return qexec.execSelect() ;
> }
>
> Is there a way to be guaranteed that, if I call getResultSet twice with the same string, the iterator over the results returns them in the same order?
>
> I don’t want to use ORDER BY in the query because it implies to go through all the results, and it is therefore slow if there are many results.
>
> Andy, you wrote once here:
> http://mail-archives.apache.org/mod_mbox/jena-users/201310.mbox/%3C5266E403.1090601@apache.org%3E
> something that make me hopes that the order is generally stable, but I could be misunderstanding what you said:
>
>> You then… have stabilized the results
>> against updates or against execution to give a different order (*rather
>> unlikely in Jena
>
> did you mean that it is unlikely in Jena that the results be returned in a different order when the query string is the same? If the order were stable 99% of the times, I would happily avoid to go through all the results and sort them: very often, people only look at the first page of results. If the query returns 1000 solutions, it is really a waste of time to list them all.
>
> TIA
>
> fps
>