You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Giorgio Stefanoni <gi...@cs.ox.ac.uk> on 2015/10/15 12:39:43 UTC

[Query Optimisation] Obtaining cardinality estimation for SPARQL queries in JENA/ARQ

Hello,

I recently read the paper ‘SPARQL Basic Graph Pattern Optimisation Using Selectivity Estimation’ by Markus Stocker et al. from WWW 2008. This paper proposes a new method based on graph statistics and heuristics for  estimating the result size of a simple SPARQL query (simple = conjunction of BGPs). As far as I know, this approach was implemented and was part of ARQ.

I have a couple of questions regarding static query optimisation in JENA/ARQ:
Does JENA/ARQ query optimiser still follow the approach by Markus Stocker et al? If no, is there a place where I can read about query optimisation in JENA/ARQ?
Given a SPARQL query q, is it possible to obtain the estimated size of the result set of q? I tried following this example (https://jena.apache.org/documentation/query/explain.html <https://jena.apache.org/documentation/query/explain.html>), however, the result of explaining the query does not include the cardinality estimation.
Best regards,

Giorgio

Re: [Query Optimisation] Obtaining cardinality estimation for SPARQL queries in JENA/ARQ

Posted by Giorgio Stefanoni <gi...@cs.ox.ac.uk>.
Thank you for your exhaustive reply!

Best regards,

Giorgio 

> On 17 Oct 2015, at 10:16, Andy Seaborne <an...@apache.org> wrote:
> 
>> On 15/10/15 11:39, Giorgio Stefanoni wrote:
>> Hello,
>> 
>> I recently read the paper ‘SPARQL Basic Graph Pattern Optimisation Using Selectivity Estimation’ by Markus Stocker et al. from WWW 2008. This paper proposes a new method based on graph statistics and heuristics for  estimating the result size of a simple SPARQL query (simple = conjunction of BGPs). As far as I know, this approach was implemented and was part of ARQ.
> 
> Markus used and implemented his work based on ARQ; it is not in the distribution.
> 
>> 
>> I have a couple of questions regarding static query optimisation in JENA/ARQ:
>> Does JENA/ARQ query optimiser still follow the approach by Markus Stocker et al? If no, is there a place where I can read about query optimisation in JENA/ARQ?
>> Given a SPARQL query q, is it possible to obtain the estimated size of the result set of q? I tried following this example (https://jena.apache.org/documentation/query/explain.html <https://jena.apache.org/documentation/query/explain.html>), however, the result of explaining the query does not include the cardinality estimation.
>> Best regards,
>> 
>> Giorgio
> 
> Query optimization happens in two stages:
> 
> 1/ Rewrite the algebra to better algebra (called the "high level") such
> as filter placement.  See Algebra.optimize
> 
> 2/ Reordering basic graph patterns is done as the query executes.  You will see the effect as the query executes. ReorderWeighted, ReorderFixed.
> 
> https://jena.apache.org/documentation/tdb/optimizer.html
> 
> The "fixed" style is applied in-memory as well.  In practice, the fixed algorithm does a reliable and fairly good job in the majority of cases.
> Stats based optimization is only really needed when that fails.
> 
> Optimization does not try to guess the size of the result set.  It tries to find a faster way to execute the query mainly by replacing, fairly conservatively, one way of executing with another.
> 
>    Andy

Re: [Query Optimisation] Obtaining cardinality estimation for SPARQL queries in JENA/ARQ

Posted by Andy Seaborne <an...@apache.org>.
On 15/10/15 11:39, Giorgio Stefanoni wrote:
> Hello,
>
> I recently read the paper ‘SPARQL Basic Graph Pattern Optimisation Using Selectivity Estimation’ by Markus Stocker et al. from WWW 2008. This paper proposes a new method based on graph statistics and heuristics for  estimating the result size of a simple SPARQL query (simple = conjunction of BGPs). As far as I know, this approach was implemented and was part of ARQ.

Markus used and implemented his work based on ARQ; it is not in the 
distribution.

>
> I have a couple of questions regarding static query optimisation in JENA/ARQ:
> Does JENA/ARQ query optimiser still follow the approach by Markus Stocker et al? If no, is there a place where I can read about query optimisation in JENA/ARQ?
> Given a SPARQL query q, is it possible to obtain the estimated size of the result set of q? I tried following this example (https://jena.apache.org/documentation/query/explain.html <https://jena.apache.org/documentation/query/explain.html>), however, the result of explaining the query does not include the cardinality estimation.
> Best regards,
>
> Giorgio
>

Query optimization happens in two stages:

1/ Rewrite the algebra to better algebra (called the "high level") such
as filter placement.  See Algebra.optimize

2/ Reordering basic graph patterns is done as the query executes.  You 
will see the effect as the query executes. ReorderWeighted, ReorderFixed.

https://jena.apache.org/documentation/tdb/optimizer.html

The "fixed" style is applied in-memory as well.  In practice, the fixed 
algorithm does a reliable and fairly good job in the majority of cases.
Stats based optimization is only really needed when that fails.

Optimization does not try to guess the size of the result set.  It tries 
to find a faster way to execute the query mainly by replacing, fairly 
conservatively, one way of executing with another.

	Andy