You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pr@jena.apache.org by GitBox <gi...@apache.org> on 2022/11/15 09:32:07 UTC

[GitHub] [jena] rvesse commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

rvesse commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1022546486


##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   @afs Is there not already a query iterator that supported this pattern?
   
   I see a `QueryIterConcat` referenced in the code deletions so why was that not suitable?  Could that have been modified to solve the issue rather than introducing a new class



##########
jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java:
##########
@@ -32,7 +37,7 @@
  * For example,
  * <ul>
  * <li>Path seq {@literal ->} BGPs or a (sequence)
- * <li>"|" is not expanded into a union.
+ * <li>"|" is expanded into a union.

Review Comment:
   Need to add new optimiser test cases that verify the modified transform is having the desired effects, and to verify it doesn't make semantically unsafe transforms as noted in other comments



##########
jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java:
##########
@@ -53,9 +58,60 @@ public TransformPathFlattern(PathCompiler pathCompiler)
     @Override
     public Op transform(OpPath opPath)
     {
-        // Flatten down to triples where possible.
-        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath()) ;
-        // Any generated paths of exactly one to triple; convert to Op.
-        return PathLib.pathToTriples(pattern) ;
+        Op op = transformReduce(opPath, pathCompiler);
+        return op;
     }
-}
+
+    static Op transformReduce(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(
+                new TransformCopy() {
+                    @Override
+                    public Op transform(OpPath opPath) {
+                        // Flatten down to triples where possible.
+                        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath());
+                        // Any generated paths of exactly one to triple; convert to Op.
+                        Op op = PathLib.pathToTriples(pattern);
+                        return transformAlt(op, pathCompiler);
+                    }
+                },
+                op);
+    }
+
+    static Op transformAlt(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(new AltTransformer(pathCompiler), op);
+    }
+
+    private static class AltTransformer extends TransformCopy {
+        private PathCompiler pathCompiler;
+
+        public AltTransformer(PathCompiler pathCompiler) {
+            this.pathCompiler = pathCompiler;
+        }
+
+        @Override
+        public Op transform(OpPath opPath) {
+            TriplePath triplePath = opPath.getTriplePath();
+            Path path = triplePath.getPath();
+            if (path instanceof P_Alt) {
+                Node s = triplePath.getSubject();

Review Comment:
   So I don't believe this transformation is semantically safe in all cases.
   
   At a minimum it needs to consider whether either end of the path is a blank node variable as pushing those into a union will make them unique variables relative to the outside of the union and change the semantics of the query.
   
   i.e. a query like `SELECT * WHERE { ?s ex:foo / (ex:a | ex:b) ?o }` can have the first part of the path rewritten to yield `SELECT * WHERE { ?s ex:foo _:bnode . _:bnode (ex:a | ex:b) ?o }`, if you then rewrite the 2nd part into a union you change the semantics -> `SELECT * WHERE { ?s ex:foo _:bnode . { _:bnode ex:a ?o . } UNION { _:bnode ex:b ?o } }`
   
   From the [SPARQL 1.1 specification](https://www.w3.org/TR/sparql11-query/#QSynBlankNodes):
   
   > Blank nodes are indicated by either the label form, such as "_:abc", or the abbreviated form "[]". A blank node that is used in only one place in the query syntax can be indicated with []. A unique blank node will be used to form the triple pattern. Blank node labels are written as "_:abc" for a blank node with label "abc". The same blank node label cannot be used in two different basic graph patterns in the same query.
   
   So that's technically an illegal query you've generated. Evaluation wise those two `_:bnode` variables are now in different scopes and so rather than joining the results of the BGP and the Union you'll be generating a cross-product
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org