You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Osma Suominen <os...@helsinki.fi> on 2017/03/09 14:48:47 UTC

Performance regression between Jena 3.1.0 and 3.2.0

Hi,

I wanted to report a performance regression I found. This is probably 
something that happened to the query optimizer in the Jena 3.1.1 
development. It may be rather benign, but the result was a severe 
performance regression in my application.

With YSO [1] as data loaded into TDB, this query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
   <http://www.yso.fi/onto/yso/p8627> ?p ?o .
   OPTIONAL {
     { ?p rdfs:subPropertyOf ?pp }
     UNION
     { ?o a ?ot }
   }
}

takes about 300 ms on Jena 3.2.0, while it took only around 25 ms on 
Jena 3.1.0.

The fix was to separate the single OPTIONAL block into two:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
   <http://www.yso.fi/onto/yso/p8627> ?p ?o .
   OPTIONAL { ?p rdfs:subPropertyOf ?pp }
   OPTIONAL { ?o a ?ot }
}

The result is that both Jena versions execute the query in around 25 ms.

You may wonder why I had a query like that in the first place, but this 
is not the actual query that I started with, which is a way more complex 
CONSTRUCT query and has many UNIONs within the OPTIONAL block  (see [2]).

The important thing was to separate the OPTIONAL block dealing with ?p 
from the OPTIONAL block dealing with ?o - as long as the block only 
deals with one variable from the pattern above, it may contain multiple 
UNIONs and actually it makes sense to use UNIONs to avoid internal cross 
products and combinatorial explosion when there are multiple solutions 
for each pattern.

-Osma


[1] http://api.finto.fi/download/yso/yso-skos.ttl

[2] 
https://github.com/NatLibFi/Skosmos/blob/master/model/sparql/GenericSparql.php#L404

-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: Performance regression between Jena 3.1.0 and 3.2.0

Posted by Andy Seaborne <an...@apache.org>.

On 09/03/17 14:48, Osma Suominen wrote:
> Hi,
>
> I wanted to report a performance regression I found. This is probably
> something that happened to the query optimizer in the Jena 3.1.1
> development. It may be rather benign, but the result was a severe
> performance regression in my application.

It is the more cautious optimization.  The optimizer does not split the 
cases of UNION making variables bound in some solutions and not others 
from the case of variables being set in nested OPTIONALs.

IMO the rewrite if better anyway.

Thanks for reporting it - it is useful information for any future 
optimization work but it's not a limited scope fix to be applied that I 
can see.  I have it setup for investigation locally.

     Andy

>
> With YSO [1] as data loaded into TDB, this query:
>
> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> SELECT *
> WHERE {
>   <http://www.yso.fi/onto/yso/p8627> ?p ?o .
>   OPTIONAL {
>     { ?p rdfs:subPropertyOf ?pp }
>     UNION
>     { ?o a ?ot }
>   }
> }
>
> takes about 300 ms on Jena 3.2.0, while it took only around 25 ms on
> Jena 3.1.0.
>
> The fix was to separate the single OPTIONAL block into two:
>
> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> SELECT *
> WHERE {
>   <http://www.yso.fi/onto/yso/p8627> ?p ?o .
>   OPTIONAL { ?p rdfs:subPropertyOf ?pp }
>   OPTIONAL { ?o a ?ot }
> }
>
> The result is that both Jena versions execute the query in around 25 ms.
>
> You may wonder why I had a query like that in the first place, but this
> is not the actual query that I started with, which is a way more complex
> CONSTRUCT query and has many UNIONs within the OPTIONAL block  (see [2]).
>
> The important thing was to separate the OPTIONAL block dealing with ?p
> from the OPTIONAL block dealing with ?o - as long as the block only
> deals with one variable from the pattern above, it may contain multiple
> UNIONs and actually it makes sense to use UNIONs to avoid internal cross
> products and combinatorial explosion when there are multiple solutions
> for each pattern.
>
> -Osma
>
>
> [1] http://api.finto.fi/download/yso/yso-skos.ttl
>
> [2]
> https://github.com/NatLibFi/Skosmos/blob/master/model/sparql/GenericSparql.php#L404
>
>