You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Laurent Pellegrino <la...@gmail.com> on 2012/01/30 17:02:03 UTC

More information on query execution (with algebra)

Hi all,

Some context to understand why I am asking more information: I have an
application where each time it is called, a new SPARQL query (as
String) is created based on the template and a quadruple (Java
object). This means that interesting values from the quadruple have to
be transformed to be put inside the SPARQL template by using a node
formatter. Then, the SPARQL query has to be parsed and the result has
to be evaluated against a dataset.

I was wondering whether I can skip these parsing steps to save some
time during the execution of the application. It seems it is possible
by working at the algebra level.

If some optimizations are done on queries by Jena, are they done
before the evaluation or after parsing? I mean, when I give to
Algebra.exec(...) a query is it always optimized via a call to
Algebra#optimize?

Is there any builder to ease the construction of algebra?

I have also seen in a wiki page [1] it is possible to work at the
syntax level. Do you think it better to work at the syntax level or at
the algebra level to do what I want?

[1] http://incubator.apache.org/jena/documentation/query/manipulating_sparql_using_arq.html

Kind Regards,

Laurent

Re: More information on query execution (with algebra)

Posted by Andy Seaborne <an...@apache.org>.

On 01/02/12 17:10, Laurent Pellegrino wrote:
> Hi,
>
> I compared two solutions regarding the creation of algebra (one graph,
> with one bgp with 7 triple patterns and a filter with 9 conditions):
>
> a) It consists in creating the algebra (by instantiating around 50
> java objects) each time I receive a quadruple.
>
> b) It uses a template and placeholders as you suggested. The template
> is created once. Placeholders are represented by using Node_Var with a
> dedicated name (there are 4 variables). Each time a quadruple is
> received, the template is rewritten by using
> NodeTransformLib#transform with a custom NodeTransform that checks the
> name of each Node_Var and replaces it by the desired value is that
> name is one of the placeholders.
>
> I was thinking that the solution you (Andy) suggested b) was better
> than a) because if I have understood how Transform works, several
> object instantiation can be avoided. Unfortunately, after running 10^6
> times both solutions (with JDK7), solution  b) is 11% slower than a).

The ways of actual Java performance are a bit to a mystery to me :-) 
Object creation/deletion can be fast or slow depending on what the JIT 
works out for scope.  And for code, CPU caches play a bigger part than 
I'd expect and "final" less.

If you get the chance, profiling each case would be informative - maybe 
there's a hot spot somewhere.  It's quite easy to have an extra string 
"+" in some loop, or touch Locale indirectly (that's really expensive!).

	Andy


>
> Laurent
>
> On Tue, Jan 31, 2012 at 3:54 PM, Laurent Pellegrino
> <la...@gmail.com>  wrote:
>> Thanks for information and advices.
>>
>>> Which storage layer are you using?
>>
>> Iam using TDB with Datasetgraph and transactions.
>>
>> Laurent
>>
>> On Tue, Jan 31, 2012 at 12:15 PM, Andy Seaborne<an...@apache.org>  wrote:
>>> On 30/01/12 16:02, Laurent Pellegrino wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Some context to understand why I am asking more information: I have an
>>>> application where each time it is called, a new SPARQL query (as
>>>> String) is created based on the template and a quadruple (Java
>>>> object). This means that interesting values from the quadruple have to
>>>> be transformed to be put inside the SPARQL template by using a node
>>>> formatter. Then, the SPARQL query has to be parsed and the result has
>>>> to be evaluated against a dataset.
>>>>
>>>> I was wondering whether I can skip these parsing steps to save some
>>>> time during the execution of the application. It seems it is possible
>>>> by working at the algebra level.
>>>>
>>>> If some optimizations are done on queries by Jena, are they done
>>>> before the evaluation or after parsing? I mean, when I give to
>>>> Algebra.exec(...) a query is it always optimized via a call to
>>>> Algebra#optimize?
>>>
>>>
>>> Optimizations are done at the start of execution.
>>>
>>> They happen at the point when QueryEngineBase calls modifyOp.
>>>
>>> In QueryEngineBase, modifyOp just returns the op unchanged.
>>>
>>> In QueryEngineMain, used by ARQ for general evaluation, mainly in-memory,
>>> modifyOp is a call to Algebra.optimize
>>>
>>> QueryEngineTDB extends QueryEngineMain.  It calls super.modifyOp and does
>>> some additional stuff.
>>>
>>> QueryEngineSDB inherits from QueryEngineBase so it does nothing.  It's
>>> processing is done earlier (historical reason) in QueryEngineSDB.init and it
>>> calls a couple of optimizations directly.  It does not want the join
>>> optimizations.
>>>
>>> You can replace the optimizer even down to a per-execution basis: see
>>> ARQConstants.sysOptimizerFactory and Optimize.decideOptimizer.  Or turn off
>>> various optimizations by symbol setting in the context. See
>>> Optimize.rewrite.
>>>
>>>
>>>> Is there any builder to ease the construction of algebra?
>>>
>>>
>>> One way might be to construct the algebra using placeholders (well-known
>>> nodes), then use a Transform to change it.
>>>
>>>
>>>> I have also seen in a wiki page [1] it is possible to work at the
>>>> syntax level. Do you think it better to work at the syntax level or at
>>>> the algebra level to do what I want?
>>>
>>>
>>> Algebra.
>>>
>>>
>>>>
>>>> [1]
>>>> http://incubator.apache.org/jena/documentation/query/manipulating_sparql_using_arq.html
>>>>
>>>> Kind Regards,
>>>>
>>>> Laurent
>>>
>>>
>>> Which storage layer are you using?
>>>
>>>         Andy

Re: More information on query execution (with algebra)

Posted by Laurent Pellegrino <la...@gmail.com>.

Hi,

I compared two solutions regarding the creation of algebra (one graph,
with one bgp with 7 triple patterns and a filter with 9 conditions):

a) It consists in creating the algebra (by instantiating around 50
java objects) each time I receive a quadruple.

b) It uses a template and placeholders as you suggested. The template
is created once. Placeholders are represented by using Node_Var with a
dedicated name (there are 4 variables). Each time a quadruple is
received, the template is rewritten by using
NodeTransformLib#transform with a custom NodeTransform that checks the
name of each Node_Var and replaces it by the desired value is that
name is one of the placeholders.

I was thinking that the solution you (Andy) suggested b) was better
than a) because if I have understood how Transform works, several
object instantiation can be avoided. Unfortunately, after running 10^6
times both solutions (with JDK7), solution  b) is 11% slower than a).

Laurent

On Tue, Jan 31, 2012 at 3:54 PM, Laurent Pellegrino
<la...@gmail.com> wrote:
> Thanks for information and advices.
>
>> Which storage layer are you using?
>
> Iam using TDB with Datasetgraph and transactions.
>
> Laurent
>
> On Tue, Jan 31, 2012 at 12:15 PM, Andy Seaborne <an...@apache.org> wrote:
>> On 30/01/12 16:02, Laurent Pellegrino wrote:
>>>
>>> Hi all,
>>>
>>> Some context to understand why I am asking more information: I have an
>>> application where each time it is called, a new SPARQL query (as
>>> String) is created based on the template and a quadruple (Java
>>> object). This means that interesting values from the quadruple have to
>>> be transformed to be put inside the SPARQL template by using a node
>>> formatter. Then, the SPARQL query has to be parsed and the result has
>>> to be evaluated against a dataset.
>>>
>>> I was wondering whether I can skip these parsing steps to save some
>>> time during the execution of the application. It seems it is possible
>>> by working at the algebra level.
>>>
>>> If some optimizations are done on queries by Jena, are they done
>>> before the evaluation or after parsing? I mean, when I give to
>>> Algebra.exec(...) a query is it always optimized via a call to
>>> Algebra#optimize?
>>
>>
>> Optimizations are done at the start of execution.
>>
>> They happen at the point when QueryEngineBase calls modifyOp.
>>
>> In QueryEngineBase, modifyOp just returns the op unchanged.
>>
>> In QueryEngineMain, used by ARQ for general evaluation, mainly in-memory,
>> modifyOp is a call to Algebra.optimize
>>
>> QueryEngineTDB extends QueryEngineMain.  It calls super.modifyOp and does
>> some additional stuff.
>>
>> QueryEngineSDB inherits from QueryEngineBase so it does nothing.  It's
>> processing is done earlier (historical reason) in QueryEngineSDB.init and it
>> calls a couple of optimizations directly.  It does not want the join
>> optimizations.
>>
>> You can replace the optimizer even down to a per-execution basis: see
>> ARQConstants.sysOptimizerFactory and Optimize.decideOptimizer.  Or turn off
>> various optimizations by symbol setting in the context. See
>> Optimize.rewrite.
>>
>>
>>> Is there any builder to ease the construction of algebra?
>>
>>
>> One way might be to construct the algebra using placeholders (well-known
>> nodes), then use a Transform to change it.
>>
>>
>>> I have also seen in a wiki page [1] it is possible to work at the
>>> syntax level. Do you think it better to work at the syntax level or at
>>> the algebra level to do what I want?
>>
>>
>> Algebra.
>>
>>
>>>
>>> [1]
>>> http://incubator.apache.org/jena/documentation/query/manipulating_sparql_using_arq.html
>>>
>>> Kind Regards,
>>>
>>> Laurent
>>
>>
>> Which storage layer are you using?
>>
>>        Andy

Re: More information on query execution (with algebra)

Posted by Laurent Pellegrino <la...@gmail.com>.

Thanks for information and advices.

> Which storage layer are you using?

Iam using TDB with Datasetgraph and transactions.

Laurent

On Tue, Jan 31, 2012 at 12:15 PM, Andy Seaborne <an...@apache.org> wrote:
> On 30/01/12 16:02, Laurent Pellegrino wrote:
>>
>> Hi all,
>>
>> Some context to understand why I am asking more information: I have an
>> application where each time it is called, a new SPARQL query (as
>> String) is created based on the template and a quadruple (Java
>> object). This means that interesting values from the quadruple have to
>> be transformed to be put inside the SPARQL template by using a node
>> formatter. Then, the SPARQL query has to be parsed and the result has
>> to be evaluated against a dataset.
>>
>> I was wondering whether I can skip these parsing steps to save some
>> time during the execution of the application. It seems it is possible
>> by working at the algebra level.
>>
>> If some optimizations are done on queries by Jena, are they done
>> before the evaluation or after parsing? I mean, when I give to
>> Algebra.exec(...) a query is it always optimized via a call to
>> Algebra#optimize?
>
>
> Optimizations are done at the start of execution.
>
> They happen at the point when QueryEngineBase calls modifyOp.
>
> In QueryEngineBase, modifyOp just returns the op unchanged.
>
> In QueryEngineMain, used by ARQ for general evaluation, mainly in-memory,
> modifyOp is a call to Algebra.optimize
>
> QueryEngineTDB extends QueryEngineMain.  It calls super.modifyOp and does
> some additional stuff.
>
> QueryEngineSDB inherits from QueryEngineBase so it does nothing.  It's
> processing is done earlier (historical reason) in QueryEngineSDB.init and it
> calls a couple of optimizations directly.  It does not want the join
> optimizations.
>
> You can replace the optimizer even down to a per-execution basis: see
> ARQConstants.sysOptimizerFactory and Optimize.decideOptimizer.  Or turn off
> various optimizations by symbol setting in the context. See
> Optimize.rewrite.
>
>
>> Is there any builder to ease the construction of algebra?
>
>
> One way might be to construct the algebra using placeholders (well-known
> nodes), then use a Transform to change it.
>
>
>> I have also seen in a wiki page [1] it is possible to work at the
>> syntax level. Do you think it better to work at the syntax level or at
>> the algebra level to do what I want?
>
>
> Algebra.
>
>
>>
>> [1]
>> http://incubator.apache.org/jena/documentation/query/manipulating_sparql_using_arq.html
>>
>> Kind Regards,
>>
>> Laurent
>
>
> Which storage layer are you using?
>
>        Andy

Re: More information on query execution (with algebra)

Posted by Andy Seaborne <an...@apache.org>.

On 30/01/12 16:02, Laurent Pellegrino wrote:
> Hi all,
>
> Some context to understand why I am asking more information: I have an
> application where each time it is called, a new SPARQL query (as
> String) is created based on the template and a quadruple (Java
> object). This means that interesting values from the quadruple have to
> be transformed to be put inside the SPARQL template by using a node
> formatter. Then, the SPARQL query has to be parsed and the result has
> to be evaluated against a dataset.
>
> I was wondering whether I can skip these parsing steps to save some
> time during the execution of the application. It seems it is possible
> by working at the algebra level.
>
> If some optimizations are done on queries by Jena, are they done
> before the evaluation or after parsing? I mean, when I give to
> Algebra.exec(...) a query is it always optimized via a call to
> Algebra#optimize?

Optimizations are done at the start of execution.

They happen at the point when QueryEngineBase calls modifyOp.

In QueryEngineBase, modifyOp just returns the op unchanged.

In QueryEngineMain, used by ARQ for general evaluation, mainly 
in-memory, modifyOp is a call to Algebra.optimize

QueryEngineTDB extends QueryEngineMain.  It calls super.modifyOp and 
does some additional stuff.

QueryEngineSDB inherits from QueryEngineBase so it does nothing.  It's 
processing is done earlier (historical reason) in QueryEngineSDB.init 
and it calls a couple of optimizations directly.  It does not want the 
join optimizations.

You can replace the optimizer even down to a per-execution basis: see 
ARQConstants.sysOptimizerFactory and Optimize.decideOptimizer.  Or turn 
off various optimizations by symbol setting in the context. See 
Optimize.rewrite.

> Is there any builder to ease the construction of algebra?

One way might be to construct the algebra using placeholders (well-known 
nodes), then use a Transform to change it.

> I have also seen in a wiki page [1] it is possible to work at the
> syntax level. Do you think it better to work at the syntax level or at
> the algebra level to do what I want?

Algebra.

>
> [1] http://incubator.apache.org/jena/documentation/query/manipulating_sparql_using_arq.html
>
> Kind Regards,
>
> Laurent

Which storage layer are you using?

	Andy