You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (JIRA)" <ji...@apache.org> on 2014/06/30 13:55:25 UTC

[jira] [Commented] (JENA-734) Filter Pushing should not apply a filter multiple times when the expression is not stable e.g. RAND()

    [ https://issues.apache.org/jira/browse/JENA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047581#comment-14047581 ] 

Andy Seaborne commented on JENA-734:
------------------------------------

{{table unit}} is correct.  It is putting a function, all of whose variables are satisfied, before the first triple pattern.  a filter always has to filter something as well as the incoming stream from the {{sequence}}.


> Filter Pushing should not apply a filter multiple times when the expression is not stable e.g. RAND()
> -----------------------------------------------------------------------------------------------------
>
>                 Key: JENA-734
>                 URL: https://issues.apache.org/jira/browse/JENA-734
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 2.11.2
>            Reporter: Rob Vesse
>             Fix For: Jena 2.12.0
>
>
> In our internal testing of the 2.11.2 we've encountered a query where the new more aggressive filter pushing behaviour causes an incorrect query plan to be produced.
> The raw SPARQL query is as follows:
> {noformat}
> SELECT ?s1 ?s2
> WHERE
> {
> {SELECT  ?s1 ?group1
> WHERE
> { ?s1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174> .
>   ?s1 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?o1
>   BIND(substr(str(?o1), 1, 2) AS ?group1)
> }
> }
> {SELECT  ?s2 ?group2
> WHERE
> { ?s2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174> .
>  ?s2 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?o2
>   BIND(substr(str(?o2), 1, 2) AS ?group2)
> }
> LIMIT   1000
> }
> FILTER ( (?group1 = ?group2) && ( rand() < 0.10 ) )
> }
> {noformat}
> The unoptimised algebra is as follows:
> {noformat}
> (project (?s1 ?s2)
>   (filter (&& (= ?group1 ?group2) (< (rand) 0.10))
>     (join
>       (project (?s1 ?group1)
>         (extend ((?group1 (substr (str ?o1) 1 2)))
>           (bgp
>             (triple ?s1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174>)
>             (triple ?s1 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?o1)
>           )))
>       (slice _ 1000
>         (project (?s2 ?group2)
>           (extend ((?group2 (substr (str ?o2) 1 2)))
>             (bgp
>               (triple ?s2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174>)
>               (triple ?s2 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?o2)
>             )))))))
> {noformat}
> However ARQ optimises this to the following:
> {noformat}
> (project (?s1 ?s2)
>   (filter (= ?group1 ?group2)
>     (join
>       (project (?s1 ?group1)
>         (extend ((?group1 (substr (str ?/o1) 1 2)))
>           (sequence
>             (filter (< (rand) 0.10)
>               (table unit))
>             (bgp
>               (triple ?s1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174>)
>               (triple ?s1 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?/o1)
>             ))))
>       (filter (< (rand) 0.10)
>         (slice _ 1000
>           (project (?s2 ?group2)
>             (extend ((?group2 (substr (str ?/o2) 1 2)))
>               (bgp
>                 (triple ?s2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174>)
>                 (triple ?s2 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?/o2)
>               ))))))))
> {noformat}
> Note that the filter clause with the {{rand}} gets applied twice as a result of the filter pushing.  As {{rand}} is not a stable function pushing an expression containing it such that it is applied twice leads to unpredictable results.
> Note that for this query the filter pushing also introduces a {{table unit}} which I am unclear as to where it comes from and whether it is valid.



--
This message was sent by Atlassian JIRA
(v6.2#6252)