You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jena.apache.org by "Rob Vesse (JIRA)" <ji...@apache.org> on 2014/06/04 16:59:01 UTC

[jira] [Closed] (JENA-709) Index join strategy may need to be more conservative when some sequence elements are potentially expensive

     [ https://issues.apache.org/jira/browse/JENA-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rob Vesse closed JENA-709.
--------------------------

       Resolution: Not a Problem
    Fix Version/s: Jena 2.11.2

Closing as Not a Problem

As Andy has rightly pointed out this actually appears to be related to JENA-633

In this case it didn't result in incorrect results because of the cross product but it instead led to performance degradation

> Index join strategy may need to be more conservative when some sequence elements are potentially expensive
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-709
>                 URL: https://issues.apache.org/jira/browse/JENA-709
>             Project: Apache Jena
>          Issue Type: Brainstorming
>          Components: ARQ, Optimizer
>    Affects Versions: Jena 2.11.1
>            Reporter: Rob Vesse
>            Priority: Minor
>             Fix For: Jena 2.11.2
>
>
> As noted in a discussion of a poorly performing query on a mailing list thread (http://s.apache.org/cAn) there are cases where the introduction of {{sequence}} can actually make the query slower when some elements in the {{sequence}} are expensive to calculate e.g. sub-queries
> The example query given is:
> {noformat}
> SELECT DISTINCT ?O ?T  ?E
> WHERE
> {  
>   ?E a x:E. 
>   {
>     SELECT ?O ?T 
>     WHERE 
>     {
>       ?O :oE ?E ;
>             :oT ?T .
>     } 
>     ORDER BY DESC(?T)
>     LIMIT 3
>   }
> }
> {noformat}
> Which produces the following algebra:
> {noformat}
> (distinct
>  (project (?O ?T ?E)
>   (sequence
>    (bgp (triple ?E rdf:type x:E))
>    (project (?O ?T)
>     (top (3 (desc ?T))
>      (bgp
>       (triple ?O :oE ?/E)
>       (triple ?O :oT ?T)
>      ))))))
> {noformat}
> Because there are no common variables due to scoping the substitution of the bindings from the first sequence element into the sub-query has no effect so the expensive sub-query (note the {{top}} operator) gets executed in full for every single LHS solution
> It is unclear from the discussion thread so far if this is just a badly written query and we don't have an example dataset that demonstrates the performance problems but just looking at the algebra it seems like we would be better avoiding use of {{sequence}} in favour of a plain {{join}} in a case like this



--
This message was sent by Atlassian JIRA
(v6.2#6252)