You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Andy Seaborne <an...@epimorphics.com> on 2011/08/08 16:35:42 UTC

Re: SPARQL queries and paging


On 08/08/11 14:35, Simon Helsen wrote:
> On this topic, I'd like to point out that we have a separate outside
> mechanism for paging which behaves like you suggest, however, the
> difference with OFFSET/LIMIT is that the next time someone makes the query
> e.g. to obtain the next page, we do expect the query to be recalculated
> since the state of the store may have changed.
>
> So, if you plan to change the behavior by introducing a caching model, you
> may actually alter the behavior unless you are able to determine that a
> subsequent execution of a query would not have changed results (e.g. by
> having the actions isolated in a transaction?)
>
> Simon

Transactions, or just Fuseki noting updates (language or grpah store 
protocol), can be used to give a version id to each state and then ETags 
can be used to drive cache invalidation.

It's a three-layer model:

   client
   SPARQL cache
   core DB.

ETags is between SPARQL cache and code DB.

The protocol between each layer is the SPARQL protocol.

The SPARQL cache can keep whole result sets for pseudo paging using 
ORDER/OFFSET/LIMIT with different policies on

The protocol between each layer is the SPARQL protocol but it coudl also 
augment the SPARQL protocol with parameters like ?page= or 
?liveness=uselastquery for better control (in addition to trying to 
intuit from requests and version ids).

Being able to set different consistency/cache efficiency tradeoffs in 
client of cache server might be useful.

	Andy

Re: SPARQL queries and paging

Posted by Simon Helsen <sh...@ca.ibm.com>.
Andy,

yes. We don't use Fuseki, but have a similar 3-tier architecture. I guess 
my issue was more that the core DB should not employ an implicit cache 
which is uncontrollable for its clients, but from your explanation below, 
I think we are on the same page. In fact, we also use etags and augment 
"sparql" by having query parameters for cached paging. The only difference 
is that we move cached paging entirely outside of sparql, E.g.

we have something like POST ...?query&pageSize=... (where the query is in 
the body)

and the answer provides a unique token which can be used to browse the 
pages until they expire, e.g.

GET ....?token=<myToken>&pageSize=...&page=...

If our clients do not want the caching behavior, we tell them to use 
OFFSET/LIMIT instead.

Simon



From:
Andy Seaborne <an...@epimorphics.com>
To:
jena-dev@incubator.apache.org
Date:
08/08/2011 10:36 AM
Subject:
Re: SPARQL queries and paging





On 08/08/11 14:35, Simon Helsen wrote:
> On this topic, I'd like to point out that we have a separate outside
> mechanism for paging which behaves like you suggest, however, the
> difference with OFFSET/LIMIT is that the next time someone makes the 
query
> e.g. to obtain the next page, we do expect the query to be recalculated
> since the state of the store may have changed.
>
> So, if you plan to change the behavior by introducing a caching model, 
you
> may actually alter the behavior unless you are able to determine that a
> subsequent execution of a query would not have changed results (e.g. by
> having the actions isolated in a transaction?)
>
> Simon

Transactions, or just Fuseki noting updates (language or grpah store 
protocol), can be used to give a version id to each state and then ETags 
can be used to drive cache invalidation.

It's a three-layer model:

   client
   SPARQL cache
   core DB.

ETags is between SPARQL cache and code DB.

The protocol between each layer is the SPARQL protocol.

The SPARQL cache can keep whole result sets for pseudo paging using 
ORDER/OFFSET/LIMIT with different policies on

The protocol between each layer is the SPARQL protocol but it coudl also 
augment the SPARQL protocol with parameters like ?page= or 
?liveness=uselastquery for better control (in addition to trying to 
intuit from requests and version ids).

Being able to set different consistency/cache efficiency tradeoffs in 
client of cache server might be useful.

                 Andy