You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (Jira)" <ji...@apache.org> on 2020/01/13 12:30:00 UTC

[jira] [Resolved] (JENA-1813) Join optimization transform results in incorrect query results

     [ https://issues.apache.org/jira/browse/JENA-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andy Seaborne resolved JENA-1813.
---------------------------------
    Fix Version/s: Jena 3.14.0
       Resolution: Fixed

Immediate fix as suggested.

See JENA-1815 for longer term work.

> Join optimization transform results in incorrect query results
> --------------------------------------------------------------
>
>                 Key: JENA-1813
>                 URL: https://issues.apache.org/jira/browse/JENA-1813
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 3.13.1
>            Reporter: Shawn Smith
>            Assignee: Andy Seaborne
>            Priority: Major
>             Fix For: Jena 3.14.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> I think I've found a query where TransformJoinStrategy incorrectly decides that a query is linear such that a "join" operation can be replaced by a "sequence" operation.  As a result, the query returns incorrect results.  Disabling optimizations with "qe.getContext().set(ARQ.optimization, false)" fixes the issue.
> Here's the query: 
> {noformat}
> PREFIX  :  <http://example.com/>
> SELECT ?a
> WHERE {
>   GRAPH :graph { :s :p ?a }
>   GRAPH :graph {
>     SELECT (?b AS ?a)
>     WHERE { :t :q ?b }
>     GROUP BY ?b
>   }
> }
> {noformat}
> Here's the data to test it with (two quads, as Trig): 
> {noformat}
> @prefix :      <http://example.com/> .
> :graph {
>     :s      :p      "a" .
>     :t      :q      "b" .
> }
> {noformat}
> I expected the query to return zero results because the two GRAPH clauses can't find compatible bindings for ?a.  But, in practice, Jena returns ?a="a" and logs a warning:
> {noformat}
> [main] WARN  BindingUtils - merge: Mismatch : "a" != "b"{noformat}
> Note the warning is actually coming from QueryIterProjectMerge.java, not BindingUtils.java.  With more complicated queries and datasets, this issue can result in thousands or millions of logged warnings.
> The query plan before optimization looks like this:
> {noformat}
> (project (?a)
>   (join
>     (graph <http://example.com/graph>
>       (bgp (triple <http://example.com/s> <http://example.com/p> ?a)))
>     (graph <http://example.com/graph>
>       (project (?a)
>         (extend ((?a ?b))
>           (group (?b)
>             (bgp (triple <http://example.com/t> <http://example.com/q> ?b))))))))
> {noformat}
> Optimization replaces "join" with "sequence" which fails to detect conflicts on ?a:
> {noformat}
> (project (?a)
>   (sequence
>     (graph <http://example.com/graph>
>       (bgp (triple <http://example.com/s> <http://example.com/p> ?a)))
>     (graph <http://example.com/graph>
>       (project (?a)
>         (extend ((?a ?/b))
>           (group (?/b)
>             (bgp (triple <http://example.com/t> <http://example.com/q> ?/b))))))))
> {noformat}
> For convenience, here's Java code that reproduces the bug:
> {noformat}
> import org.apache.jena.query.ARQ;
> import org.apache.jena.query.Dataset;
> import org.apache.jena.query.DatasetFactory;
> import org.apache.jena.query.QueryExecution;
> import org.apache.jena.query.QueryExecutionFactory;
> import org.apache.jena.query.ResultSet;
> import org.apache.jena.riot.Lang;
> import org.apache.jena.riot.RDFParser;
> import org.junit.Test;
> public class QueryTest {
>     @Test
>     public void testGraphQuery() {
>         String query = "" +
>                 "PREFIX  :  <http://example.com/>\n" +
>                 "SELECT ?a\n" +
>                 "WHERE {\n" +
>                 "  GRAPH :graph { :s :p ?a }\n" +
>                 "  GRAPH :graph {\n" +
>                 "    SELECT (?b AS ?a)\n" +
>                 "    WHERE { :t :q ?b }\n" +
>                 "    GROUP BY ?b\n" +
>                 "  }\n" +
>                 "}\n";
>         String data = "" +
>                 "@prefix :  <http://example.com/> .\n" +
>                 ":graph {\n" +
>                 "  :s  :p  \"a\" .\n" +
>                 "  :t  :q  \"b\" .\n" +
>                 "}\n";
>         Dataset ds = DatasetFactory.create();
>         RDFParser.fromString(data).lang(Lang.TRIG).parse(ds);
>         try (QueryExecution qe = QueryExecutionFactory.create(query, ds)) {
>             qe.getContext().set(ARQ.optimization, true);  // flipping this to false fixes the test
>             ResultSet rs = qe.execSelect();
>             if (rs.hasNext()) {
>                 System.out.println(rs.nextBinding());
>                 throw new AssertionError("Result set should be empty");
>             }
>         }
>     }
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)