You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Rob Vesse (JIRA)" <ji...@apache.org> on 2014/06/30 13:07:24 UTC
[jira] [Created] (JENA-734) Filter Pushing should not apply a
filter multiple times when the expression is not stable e.g. RAND()
Rob Vesse created JENA-734:
------------------------------
Summary: Filter Pushing should not apply a filter multiple times when the expression is not stable e.g. RAND()
Key: JENA-734
URL: https://issues.apache.org/jira/browse/JENA-734
Project: Apache Jena
Issue Type: Bug
Components: ARQ
Affects Versions: Jena 2.11.2
Reporter: Rob Vesse
Fix For: Jena 2.12.0
In our internal testing of the 2.11.2 we've encountered a query where the new more aggressive filter pushing behaviour causes an incorrect query plan to be produced.
The raw SPARQL query is as follows:
{noformat}
SELECT ?s1 ?s2
WHERE
{
{SELECT ?s1 ?group1
WHERE
{ ?s1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174> .
?s1 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?o1
BIND(substr(str(?o1), 1, 2) AS ?group1)
}
}
{SELECT ?s2 ?group2
WHERE
{ ?s2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174> .
?s2 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?o2
BIND(substr(str(?o2), 1, 2) AS ?group2)
}
LIMIT 1000
}
FILTER ( (?group1 = ?group2) && ( rand() < 0.10 ) )
}
{noformat}
The unoptimised algebra is as follows:
{noformat}
(project (?s1 ?s2)
(filter (&& (= ?group1 ?group2) (< (rand) 0.10))
(join
(project (?s1 ?group1)
(extend ((?group1 (substr (str ?o1) 1 2)))
(bgp
(triple ?s1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174>)
(triple ?s1 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?o1)
)))
(slice _ 1000
(project (?s2 ?group2)
(extend ((?group2 (substr (str ?o2) 1 2)))
(bgp
(triple ?s2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174>)
(triple ?s2 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?o2)
)))))))
{noformat}
However ARQ optimises this to the following:
{noformat}
(project (?s1 ?s2)
(filter (= ?group1 ?group2)
(join
(project (?s1 ?group1)
(extend ((?group1 (substr (str ?/o1) 1 2)))
(sequence
(filter (< (rand) 0.10)
(table unit))
(bgp
(triple ?s1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174>)
(triple ?s1 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?/o1)
))))
(filter (< (rand) 0.10)
(slice _ 1000
(project (?s2 ?group2)
(extend ((?group2 (substr (str ?/o2) 1 2)))
(bgp
(triple ?s2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174>)
(triple ?s2 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?/o2)
))))))))
{noformat}
Note that the filter clause with the {{rand}} gets applied twice as a result of the filter pushing. As {{rand}} is not a stable function pushing an expression containing it such that it is applied twice leads to unpredictable results.
Note that for this query the filter pushing also introduces a {{table unit}} which I am unclear as to where it comes from and whether it is valid.
--
This message was sent by Atlassian JIRA
(v6.2#6252)