You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Rob Vesse (JIRA)" <ji...@apache.org> on 2014/06/30 13:07:24 UTC

[jira] [Created] (JENA-734) Filter Pushing should not apply a filter multiple times when the expression is not stable e.g. RAND()

Rob Vesse created JENA-734:
------------------------------

             Summary: Filter Pushing should not apply a filter multiple times when the expression is not stable e.g. RAND()
                 Key: JENA-734
                 URL: https://issues.apache.org/jira/browse/JENA-734
             Project: Apache Jena
          Issue Type: Bug
          Components: ARQ
    Affects Versions: Jena 2.11.2
            Reporter: Rob Vesse
             Fix For: Jena 2.12.0


In our internal testing of the 2.11.2 we've encountered a query where the new more aggressive filter pushing behaviour causes an incorrect query plan to be produced.

The raw SPARQL query is as follows:

{noformat}
SELECT ?s1 ?s2
WHERE
{
{SELECT  ?s1 ?group1
WHERE
{ ?s1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174> .
  ?s1 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?o1
  BIND(substr(str(?o1), 1, 2) AS ?group1)
}
}
{SELECT  ?s2 ?group2
WHERE
{ ?s2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174> .
 ?s2 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?o2
  BIND(substr(str(?o2), 1, 2) AS ?group2)
}
LIMIT   1000
}
FILTER ( (?group1 = ?group2) && ( rand() < 0.10 ) )
}
{noformat}

The unoptimised algebra is as follows:

{noformat}
(project (?s1 ?s2)
  (filter (&& (= ?group1 ?group2) (< (rand) 0.10))
    (join
      (project (?s1 ?group1)
        (extend ((?group1 (substr (str ?o1) 1 2)))
          (bgp
            (triple ?s1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174>)
            (triple ?s1 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?o1)
          )))
      (slice _ 1000
        (project (?s2 ?group2)
          (extend ((?group2 (substr (str ?o2) 1 2)))
            (bgp
              (triple ?s2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174>)
              (triple ?s2 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?o2)
            )))))))
{noformat}

However ARQ optimises this to the following:

{noformat}
(project (?s1 ?s2)
  (filter (= ?group1 ?group2)
    (join
      (project (?s1 ?group1)
        (extend ((?group1 (substr (str ?/o1) 1 2)))
          (sequence
            (filter (< (rand) 0.10)
              (table unit))
            (bgp
              (triple ?s1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174>)
              (triple ?s1 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?/o1)
            ))))
      (filter (< (rand) 0.10)
        (slice _ 1000
          (project (?s2 ?group2)
            (extend ((?group2 (substr (str ?/o2) 1 2)))
              (bgp
                (triple ?s2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://localhost:2020/vocab/tbl9sbe_he8g_174>)
                (triple ?s2 <http://localhost:2020/vocab/tbl9sbe_he8g_174_Col_0> ?/o2)
              ))))))))
{noformat}

Note that the filter clause with the {{rand}} gets applied twice as a result of the filter pushing.  As {{rand}} is not a stable function pushing an expression containing it such that it is applied twice leads to unpredictable results.

Note that for this query the filter pushing also introduces a {{table unit}} which I am unclear as to where it comes from and whether it is valid.



--
This message was sent by Atlassian JIRA
(v6.2#6252)