You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (JIRA)" <ji...@apache.org> on 2014/10/01 12:54:34 UTC

[jira] [Closed] (JENA-779) Filter placement should be able to break up extend

     [ https://issues.apache.org/jira/browse/JENA-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andy Seaborne closed JENA-779.
------------------------------

> Filter placement should be able to break up extend
> --------------------------------------------------
>
>                 Key: JENA-779
>                 URL: https://issues.apache.org/jira/browse/JENA-779
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ, Optimizer
>    Affects Versions: Jena 2.12.0
>            Reporter: Rob Vesse
>            Assignee: Andy Seaborne
>             Fix For: Jena 2.12.1
>
>         Attachments: JENA-779-filter-extend-extend, JENA-779-filter-extend_distinct.patch, JENA-779-single-extend.patch, JENA-779.patch
>
>
> The following query demonstrates a query plan seen internally which is considered sub-optimal.
> Consider the following query:
> {noformat}
> SELECT DISTINCT ?domainName
> {
>   { ?uri ?p ?o }
>   UNION
>   {
>     ?sub ?p ?uri
>     FILTER(isIRI(?uri))
>   }
>   BIND(str(?uri) as ?s)
>   FILTER(STRSTARTS(?s, "http://"))
>   BIND(IRI(CONCAT("http://", STRBEFORE(SUBSTR(?s,8), "/"))) AS ?domainName)
> }
> {noformat}
> Which ARQ optimises as follows:
> {noformat}
> (distinct
>   (project (?domainName)
>     (filter (strstarts ?s "http://")
>       (extend ((?s (str ?uri)) (?domainName (iri (concat "http://" (strbefore (substr ?s 8) "/")))))
>         (union
>           (bgp (triple ?uri ?p ?o))
>           (filter (isIRI ?uri)
>             (bgp (triple ?sub ?p ?uri))))))))
> {noformat}
> Which makes the query engine do a lot of work because it computes the both the {{BIND}} expressions for lots of possible solutions that will then be rejected when for many of them it would only be necessary to compute the first simple {{BIND}} function.
> It would be better if the query was planned as follows:
> {noformat}
> (distinct
>   (project (?domainName)
>     (extend (?domainName (iri (concat "http://" (strbefore (substr ?s 8) "/"))))
>       (filter (strstarts ?s "http://")
>         (extend (?s (str ?uri))
>           (union
>             (bgp (triple ?uri ?p ?o))
>             (filter (isIRI ?uri)
>               (bgp (triple ?sub ?p ?uri)))))))))
> {noformat}
> Essentially when we try to push a filter through an {{extend}} if we determine that we cannot push it through the extend we should see if we can split the {{extend}} instead thus resulting in a partial pushing.
> Note that a user can re-write the original query to yield this plan if they make the second {{BIND}} a project expression like so:
> {noformat}
> SELECT DISTINCT (IRI(CONCAT("http://", STRBEFORE(SUBSTR(?s,8), "/"))) AS ?domainName)
> {
>   { ?uri ?p ?o }
>   UNION
>   {
>     ?sub ?p ?uri
>     FILTER(isIRI(?uri))
>   }
>   BIND(str(?uri) as ?s)
>   FILTER(STRSTARTS(?s, "http://"))
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)