You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (JIRA)" <ji...@apache.org> on 2014/06/30 13:55:25 UTC
[jira] [Commented] (JENA-734) Filter Pushing should not apply a
filter multiple times when the expression is not stable e.g. RAND()
[ https://issues.apache.org/jira/browse/JENA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047581#comment-14047581 ]
Andy Seaborne commented on JENA-734:
------------------------------------
{{table unit}} is correct. It is putting a function, all of whose variables are satisfied, before the first triple pattern. a filter always has to filter something as well as the incoming stream from the {{sequence}}.
> Filter Pushing should not apply a filter multiple times when the expression is not stable e.g. RAND()
> -----------------------------------------------------------------------------------------------------
>
> Key: JENA-734
> URL: https://issues.apache.org/jira/browse/JENA-734
> Project: Apache Jena
> Issue Type: Bug
> Components: ARQ
> Affects Versions: Jena 2.11.2
> Reporter: Rob Vesse
> Fix For: Jena 2.12.0
>
>
> In our internal testing of the 2.11.2 we've encountered a query where the new more aggressive filter pushing behaviour causes an incorrect query plan to be produced.
> The raw SPARQL query is as follows:
> {noformat}
> SELECT ?s1 ?s2
> WHERE
> {
> {SELECT ?s1 ?group1
> WHERE
> { ?s1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174> .
> ?s1 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?o1
> BIND(substr(str(?o1), 1, 2) AS ?group1)
> }
> }
> {SELECT ?s2 ?group2
> WHERE
> { ?s2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174> .
> ?s2 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?o2
> BIND(substr(str(?o2), 1, 2) AS ?group2)
> }
> LIMIT 1000
> }
> FILTER ( (?group1 = ?group2) && ( rand() < 0.10 ) )
> }
> {noformat}
> The unoptimised algebra is as follows:
> {noformat}
> (project (?s1 ?s2)
> (filter (&& (= ?group1 ?group2) (< (rand) 0.10))
> (join
> (project (?s1 ?group1)
> (extend ((?group1 (substr (str ?o1) 1 2)))
> (bgp
> (triple ?s1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174>)
> (triple ?s1 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?o1)
> )))
> (slice _ 1000
> (project (?s2 ?group2)
> (extend ((?group2 (substr (str ?o2) 1 2)))
> (bgp
> (triple ?s2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174>)
> (triple ?s2 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?o2)
> )))))))
> {noformat}
> However ARQ optimises this to the following:
> {noformat}
> (project (?s1 ?s2)
> (filter (= ?group1 ?group2)
> (join
> (project (?s1 ?group1)
> (extend ((?group1 (substr (str ?/o1) 1 2)))
> (sequence
> (filter (< (rand) 0.10)
> (table unit))
> (bgp
> (triple ?s1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174>)
> (triple ?s1 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?/o1)
> ))))
> (filter (< (rand) 0.10)
> (slice _ 1000
> (project (?s2 ?group2)
> (extend ((?group2 (substr (str ?/o2) 1 2)))
> (bgp
> (triple ?s2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174>)
> (triple ?s2 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?/o2)
> ))))))))
> {noformat}
> Note that the filter clause with the {{rand}} gets applied twice as a result of the filter pushing. As {{rand}} is not a stable function pushing an expression containing it such that it is applied twice leads to unpredictable results.
> Note that for this query the filter pushing also introduces a {{table unit}} which I am unclear as to where it comes from and whether it is valid.
--
This message was sent by Atlassian JIRA
(v6.2#6252)