You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pr@jena.apache.org by GitBox <gi...@apache.org> on 2022/11/14 21:13:53 UTC

[GitHub] [jena] SimonBin opened a new pull request, #1616: improve jena performance and behaviour in certain path cases

SimonBin opened a new pull request, #1616:
URL: https://github.com/apache/jena/pull/1616

   Issues fixed (or at least improved): JENA-1300 JENA-2325 GH-1614
   
   - the GraphUtils.allNodes used in the worst case determineUngroundedStartingSet was previously computing the contents of the whole data set. this has two drawbacks, (1) not lazy (2) not easy to abort, i.e queryTimeout was not respected
   
     changed it to Iter.distinct (unfortunately that means in worst case it will still require to load everything into memory)
   
   - the PathLib.execUngroundedPath and execUngroundedPathSameVar were both consuming the iterator fully while creating new QueryIterConcats. Drawbacks are the same as above, additionally LIMIT could not be pushed down. Changed it to be lazy instead
   
     a new helper QueryIter based on a copy of QueryIterConcat, but for Iterators.
   
   - the TransformPathFlattern which was implemented by pathCompiler.reduce did not implement P_Alt optimisation. The reason for this was unknown (JENA-2325). The optimisation was added to the Flatterner in addition to the pathCompiler.reduce step (copied from the FlatternerStd).
   
   nb.  it should be possible to rewrite the PathEvaluator/PathEngine with a Iterator Pull pattern instead of recursive fill, as hinted by @afs. however due to unfamiliarity with the code and probably being a larger undertaking this part was not touched. As a consequence, the code can still get stuck somewhere inside PathEval/PathEngine
   
   co-authored with @LorenzBuehmann
   
   
   
   
   ----
   
    - [ ] Tests are included.
    - [ ] Documentation change and updates are provided for the [Apache Jena website](https://github.com/apache/jena-site/)
    - [ ] Commits have been squashed to remove intermediate development commit messages.
    - [ ] Key commit messages start with the issue number (GH-xxxx or JENA-xxxx)
   
   By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the [Contributor's Agreement](https://www.apache.org/licenses/contributor-agreements.html).
   
   ----
   
   See the [Apache Jena "Contributing" guide](https://github.com/apache/jena/blob/main/CONTRIBUTING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] SimonBin commented on pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
SimonBin commented on PR #1616:
URL: https://github.com/apache/jena/pull/1616#issuecomment-1314390029

   sample path: ` ?a :p1|^:p2 ?b `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] SimonBin commented on pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
SimonBin commented on PR #1616:
URL: https://github.com/apache/jena/pull/1616#issuecomment-1317294563

   my concern would be that this completely different path flatterning in #1620  has never been put to wide testing and per the comment in the source file, `It does not produce very nice execution structures so ARQ uses
    a functional equivalent, but different, transformation.`. One obvious instance from your tests, `:p1{2}` now becomes a join instead of a simple bgp (although I guess the other optimisers will turn that back into a sequence so maybe it doesn't matter)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] SimonBin commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
SimonBin commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1023784610


##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   hummmm... I can see how the execUngroundedPath could be rewritten with ExtendByVar + implementing RepeatApply but for the  execUngroundedPathSameVar the code is more cumbersome as we want to go from Node -> (x, node) -> (x, binding) -> QueryIterator but the RepeatApply only takes a binding



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] SimonBin commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
SimonBin commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1023240331


##########
jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java:
##########
@@ -53,9 +58,60 @@ public TransformPathFlattern(PathCompiler pathCompiler)
     @Override
     public Op transform(OpPath opPath)
     {
-        // Flatten down to triples where possible.
-        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath()) ;
-        // Any generated paths of exactly one to triple; convert to Op.
-        return PathLib.pathToTriples(pattern) ;
+        Op op = transformReduce(opPath, pathCompiler);
+        return op;
     }
-}
+
+    static Op transformReduce(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(
+                new TransformCopy() {
+                    @Override
+                    public Op transform(OpPath opPath) {
+                        // Flatten down to triples where possible.
+                        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath());
+                        // Any generated paths of exactly one to triple; convert to Op.
+                        Op op = PathLib.pathToTriples(pattern);
+                        return transformAlt(op, pathCompiler);
+                    }
+                },
+                op);
+    }
+
+    static Op transformAlt(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(new AltTransformer(pathCompiler), op);
+    }
+
+    private static class AltTransformer extends TransformCopy {
+        private PathCompiler pathCompiler;
+
+        public AltTransformer(PathCompiler pathCompiler) {
+            this.pathCompiler = pathCompiler;
+        }
+
+        @Override
+        public Op transform(OpPath opPath) {
+            TriplePath triplePath = opPath.getTriplePath();
+            Path path = triplePath.getPath();
+            if (path instanceof P_Alt) {
+                Node s = triplePath.getSubject();

Review Comment:
   manual testing based on the input provided by @rvesse , I added some tests to TestTransformPathFlatten now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
afs commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1024277910


##########
jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java:
##########
@@ -53,9 +58,53 @@ public TransformPathFlattern(PathCompiler pathCompiler)
     @Override
     public Op transform(OpPath opPath)
     {
+        Op op = transformReduce(opPath, pathCompiler);
+        return op;
+    }
+
+    static Op transformReduce(OpPath opPath, PathCompiler pathCompiler) {
         // Flatten down to triples where possible.
-        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath()) ;
+        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath());
         // Any generated paths of exactly one to triple; convert to Op.
-        return PathLib.pathToTriples(pattern) ;
+        Op op = PathLib.pathToTriples(pattern);
+        return transformAlt(op, pathCompiler);

Review Comment:
   A request for explanation: 
   
   We seem to have two mechanism for the same thing here - PathCompiler and AltTransformer+transformReduce.
   
   What advantages does this bring?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
Aklakan commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1022619507


##########
jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java:
##########
@@ -53,9 +58,60 @@ public TransformPathFlattern(PathCompiler pathCompiler)
     @Override
     public Op transform(OpPath opPath)
     {
-        // Flatten down to triples where possible.
-        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath()) ;
-        // Any generated paths of exactly one to triple; convert to Op.
-        return PathLib.pathToTriples(pattern) ;
+        Op op = transformReduce(opPath, pathCompiler);
+        return op;
     }
-}
+
+    static Op transformReduce(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(
+                new TransformCopy() {
+                    @Override
+                    public Op transform(OpPath opPath) {
+                        // Flatten down to triples where possible.
+                        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath());
+                        // Any generated paths of exactly one to triple; convert to Op.
+                        Op op = PathLib.pathToTriples(pattern);
+                        return transformAlt(op, pathCompiler);
+                    }
+                },
+                op);
+    }
+
+    static Op transformAlt(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(new AltTransformer(pathCompiler), op);
+    }
+
+    private static class AltTransformer extends TransformCopy {
+        private PathCompiler pathCompiler;
+
+        public AltTransformer(PathCompiler pathCompiler) {
+            this.pathCompiler = pathCompiler;
+        }
+
+        @Override
+        public Op transform(OpPath opPath) {
+            TriplePath triplePath = opPath.getTriplePath();
+            Path path = triplePath.getPath();
+            if (path instanceof P_Alt) {
+                Node s = triplePath.getSubject();

Review Comment:
   Hm, the transform happens on the algebra level so all bnodes should have been renamed to variables with appropriate scoping - right?
   Also, I guess the transform is not correctly reversible with `OpAsQuery` but at this point in the optimization process it doesn't have to be.
   On the syntax level the transformation would certainly suffer from the issue which you mentioned.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
afs commented on PR #1616:
URL: https://github.com/apache/jena/pull/1616#issuecomment-1315880300

   `TransformPathFlatternAlt` performs `P_Alt` transformation. 
   
   IIRC it interacts with the BGP extension mechanism when considering beyond simple entailment. We;ll need to investigate this.
   
   Things to consider:
   
   * It might be better to restrict optimization to the case of the specific and important case of `(property | property)`, not general path. This is mentioned on [JENA-2325](https://issues.apache.org/jira/browse/JENA-2325).
   * It might be better to do performance improvements in the evaluation stage to preserve semantics. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] SimonBin commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
SimonBin commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1023316226


##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   would that have any advantage over or be significantly differnt from QueryIterPlainWrapper(Iter.flatMap)? Note how both cases seem to involve a QueryIterPlainWrapper 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
Aklakan commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1023314216


##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   You can rearrange the code in `PathLib` along the lines of first turning the Nodes to Bindings, and then wrap that as a QueryIterator which serves as input to QueryIterRepeatApply:
   
   ```java
   Iterator<Node> itNode = determineUngroundedStartingSet(graph, path, execCxt) ;
   Iterator<Binding> itBinding = Iter.iter(itNode).map(n -> BindingFactory.binding(binding, sVar, n));
   QueryIterator lhs = QueryIterPlainWrapper.wrap(itBinding);
   QueryIterator combined = new QueryIterRepeatApply(lhs) {
       QueryIterator nextStage(Binding b2) {
           Iterator<Node> pathIter = PathEval.eval(graph, b2.get(sVar), path, execCxt.getContext());
           return evalGroundedOneEnd(b2, pathIter, oVar, execCxt);
     }
   };
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
afs commented on PR #1616:
URL: https://github.com/apache/jena/pull/1616#issuecomment-1316669973

   > an unbound triple pattern by gathering as seed nodes all subjects and objects in the graph
   
   Part 1 of #1614 is about a lack of reordering across paths and isn't in this PR.
   
   We need to check we aren't heading off in the wrong direction with changes in path eval when there is a bigger picture.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] SimonBin commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
SimonBin commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1023326012


##########
jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java:
##########
@@ -53,9 +58,60 @@ public TransformPathFlattern(PathCompiler pathCompiler)
     @Override
     public Op transform(OpPath opPath)
     {
-        // Flatten down to triples where possible.
-        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath()) ;
-        // Any generated paths of exactly one to triple; convert to Op.
-        return PathLib.pathToTriples(pattern) ;
+        Op op = transformReduce(opPath, pathCompiler);
+        return op;
     }
-}
+
+    static Op transformReduce(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(

Review Comment:
   so am I! fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] SimonBin commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
SimonBin commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1023240331


##########
jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java:
##########
@@ -53,9 +58,60 @@ public TransformPathFlattern(PathCompiler pathCompiler)
     @Override
     public Op transform(OpPath opPath)
     {
-        // Flatten down to triples where possible.
-        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath()) ;
-        // Any generated paths of exactly one to triple; convert to Op.
-        return PathLib.pathToTriples(pattern) ;
+        Op op = transformReduce(opPath, pathCompiler);
+        return op;
     }
-}
+
+    static Op transformReduce(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(
+                new TransformCopy() {
+                    @Override
+                    public Op transform(OpPath opPath) {
+                        // Flatten down to triples where possible.
+                        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath());
+                        // Any generated paths of exactly one to triple; convert to Op.
+                        Op op = PathLib.pathToTriples(pattern);
+                        return transformAlt(op, pathCompiler);
+                    }
+                },
+                op);
+    }
+
+    static Op transformAlt(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(new AltTransformer(pathCompiler), op);
+    }
+
+    private static class AltTransformer extends TransformCopy {
+        private PathCompiler pathCompiler;
+
+        public AltTransformer(PathCompiler pathCompiler) {
+            this.pathCompiler = pathCompiler;
+        }
+
+        @Override
+        public Op transform(OpPath opPath) {
+            TriplePath triplePath = opPath.getTriplePath();
+            Path path = triplePath.getPath();
+            if (path instanceof P_Alt) {
+                Node s = triplePath.getSubject();

Review Comment:
   manual testing based on the input provided by @rvesse , I will add some tests to TestTransformPathFlatten 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] SimonBin commented on pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
SimonBin commented on PR #1616:
URL: https://github.com/apache/jena/pull/1616#issuecomment-1327237937

   I removed the Transformer. Note, step 1  of  #1629 would mean to change PathEval to iters (see my comment there, starting point would be PathEval.eval$), that is not implemented in this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
Aklakan commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1022619507


##########
jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java:
##########
@@ -53,9 +58,60 @@ public TransformPathFlattern(PathCompiler pathCompiler)
     @Override
     public Op transform(OpPath opPath)
     {
-        // Flatten down to triples where possible.
-        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath()) ;
-        // Any generated paths of exactly one to triple; convert to Op.
-        return PathLib.pathToTriples(pattern) ;
+        Op op = transformReduce(opPath, pathCompiler);
+        return op;
     }
-}
+
+    static Op transformReduce(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(
+                new TransformCopy() {
+                    @Override
+                    public Op transform(OpPath opPath) {
+                        // Flatten down to triples where possible.
+                        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath());
+                        // Any generated paths of exactly one to triple; convert to Op.
+                        Op op = PathLib.pathToTriples(pattern);
+                        return transformAlt(op, pathCompiler);
+                    }
+                },
+                op);
+    }
+
+    static Op transformAlt(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(new AltTransformer(pathCompiler), op);
+    }
+
+    private static class AltTransformer extends TransformCopy {
+        private PathCompiler pathCompiler;
+
+        public AltTransformer(PathCompiler pathCompiler) {
+            this.pathCompiler = pathCompiler;
+        }
+
+        @Override
+        public Op transform(OpPath opPath) {
+            TriplePath triplePath = opPath.getTriplePath();
+            Path path = triplePath.getPath();
+            if (path instanceof P_Alt) {
+                Node s = triplePath.getSubject();

Review Comment:
   Hm, The transform happens on the algebra level so all bnodes should have been renamed to variables with appropriate scoping - right?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] SimonBin commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
SimonBin commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1023296087


##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   sorry, I don't understand. RepeatApply is an abstract class that takes a QueryIterator and repeatedly calls nextStage, how will that help here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
Aklakan commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1023314216


##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   You can rearrange the code in `PathLib` along the lines of first turning the Nodes to Bindings, and then wrap that as a QueryIterator which serves as input to QueryIterRepeatApply:
   
   ```java
   Iterator<Node> itNode = determineUngroundedStartingSet(graph, path, execCxt) ;
   Iterator<Binding> itBinding = Iter.iter(itNode).map(n -> BindingFactory.binding(binding, sVar, n));
   QueryIterator lhs = QueryIterPlainWrapper.wrap(itBinding);
   QueryIterator combined = new QueryIterRepeatApply(lhs, execCxt) {
       QueryIterator nextStage(Binding b2) {
           Iterator<Node> pathIter = PathEval.eval(graph, b2.get(sVar), path, getExecCxt().getContext());
           return evalGroundedOneEnd(b2, pathIter, oVar, getExecCxt());
       }
   };
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
Aklakan commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1023314216


##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   You can rearrange the code in `PathLib` along the lines of first turning the Nodes to Bindings, and then wrap that as a QueryIterator which serves as input to QueryIterRepeatApply:
   
   ```java
   Iterator<Node> itNode = determineUngroundedStartingSet(graph, path, execCxt) ;
   Iterator<Binding> itBinding = Iter.iter(itNode).map(n -> BindingFactory.binding(binding, sVar, n));
   QueryIterator lhs = QueryIterPlainWrapper.wrap(itBinding);
   QueryIterator combined = new QueryIterRepeatApply(lhs) {
     QueryIterator nextStage() {
                   Iterator<Node> pathIter = PathEval.eval(graph, b2.get(sVar), path, execCxt.getContext());
               return evalGroundedOneEnd(b2, pathIter, oVar, execCxt);
     }
   };
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
afs commented on PR #1616:
URL: https://github.com/apache/jena/pull/1616#issuecomment-1317721473

   Please fix commit 900dfc7676ba2d1e8a18d63c3c7d5eb5dc2a2dac - it has the whole original issue description (before it was fixed it up) and has people's names and ids in it.
   
   Put "GH-1616: " at the start of the commit message. It will then refer to the PR.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
Aklakan commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1023314216


##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   You can rearrange the code in `PathLib` along the lines of first turning the Nodes to Bindings, and then wrap that as a QueryIterator which serves as input to QueryIterRepeatApply:
   
   ```java
   Iterator<Node> itNode = determineUngroundedStartingSet(graph, path, execCxt) ;
   Iterator<Binding> itBinding = Iter.iter(itNode).map(n -> BindingFactory.binding(binding, sVar, n));
   QueryIterator lhs = QueryIterPlainWrapper.wrap(itBinding);
   QueryIterator combined = new QueryIterRepeatApply(lhs, execCxt) {
       QueryIterator nextStage(Binding b2) {
           Iterator<Node> pathIter = PathEval.eval(graph, b2.get(sVar), path, getExecCxt().getContext());
           return evalGroundedOneEnd(b2, pathIter, oVar, getExecCxt());
       }
   };
   ```
   The machinery should cleanly delegate all the close calls provided that `Iterator`s are closeable (using `Iter`).
   @afs That said - I realize I don't fully understand the difference between cancel() and close() - doesn't close imply cancel? Because `Iter` doesn't have this distinction - so what does the gap between `Iter` and `QueryIter` imply?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
afs commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1023218358


##########
jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java:
##########
@@ -53,9 +58,60 @@ public TransformPathFlattern(PathCompiler pathCompiler)
     @Override
     public Op transform(OpPath opPath)
     {
-        // Flatten down to triples where possible.
-        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath()) ;
-        // Any generated paths of exactly one to triple; convert to Op.
-        return PathLib.pathToTriples(pattern) ;
+        Op op = transformReduce(opPath, pathCompiler);
+        return op;
     }
-}
+
+    static Op transformReduce(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(
+                new TransformCopy() {
+                    @Override
+                    public Op transform(OpPath opPath) {
+                        // Flatten down to triples where possible.
+                        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath());
+                        // Any generated paths of exactly one to triple; convert to Op.
+                        Op op = PathLib.pathToTriples(pattern);
+                        return transformAlt(op, pathCompiler);
+                    }
+                },
+                op);
+    }
+
+    static Op transformAlt(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(new AltTransformer(pathCompiler), op);
+    }
+
+    private static class AltTransformer extends TransformCopy {
+        private PathCompiler pathCompiler;
+
+        public AltTransformer(PathCompiler pathCompiler) {
+            this.pathCompiler = pathCompiler;
+        }
+
+        @Override
+        public Op transform(OpPath opPath) {
+            TriplePath triplePath = opPath.getTriplePath();
+            Path path = triplePath.getPath();
+            if (path instanceof P_Alt) {
+                Node s = triplePath.getSubject();

Review Comment:
   > In my testing
   
   Which tests are you referring to? There don't seem to be any in the PR. 
   `TestTransformPathFlatten`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] SimonBin commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
SimonBin commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1023249068


##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   should I delete this class and replace it with QueryIterPlainWrapper.create(Iter.flatMap(...))?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on pull request #1616: calculate ungrounded path starting set lazily

Posted by GitBox <gi...@apache.org>.
afs commented on PR #1616:
URL: https://github.com/apache/jena/pull/1616#issuecomment-1327378287

   @simonbin - Thanks!
   
   I'll do a PR for better ungrounded starting based on this PR.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs merged pull request #1616: calculate ungrounded path starting set lazily

Posted by GitBox <gi...@apache.org>.
afs merged PR #1616:
URL: https://github.com/apache/jena/pull/1616


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] rvesse commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
rvesse commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1022546486


##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   @afs Is there not already a query iterator that supported this pattern?
   
   I see a `QueryIterConcat` referenced in the code deletions so why was that not suitable?  Could that have been modified to solve the issue rather than introducing a new class



##########
jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java:
##########
@@ -32,7 +37,7 @@
  * For example,
  * <ul>
  * <li>Path seq {@literal ->} BGPs or a (sequence)
- * <li>"|" is not expanded into a union.
+ * <li>"|" is expanded into a union.

Review Comment:
   Need to add new optimiser test cases that verify the modified transform is having the desired effects, and to verify it doesn't make semantically unsafe transforms as noted in other comments



##########
jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java:
##########
@@ -53,9 +58,60 @@ public TransformPathFlattern(PathCompiler pathCompiler)
     @Override
     public Op transform(OpPath opPath)
     {
-        // Flatten down to triples where possible.
-        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath()) ;
-        // Any generated paths of exactly one to triple; convert to Op.
-        return PathLib.pathToTriples(pattern) ;
+        Op op = transformReduce(opPath, pathCompiler);
+        return op;
     }
-}
+
+    static Op transformReduce(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(
+                new TransformCopy() {
+                    @Override
+                    public Op transform(OpPath opPath) {
+                        // Flatten down to triples where possible.
+                        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath());
+                        // Any generated paths of exactly one to triple; convert to Op.
+                        Op op = PathLib.pathToTriples(pattern);
+                        return transformAlt(op, pathCompiler);
+                    }
+                },
+                op);
+    }
+
+    static Op transformAlt(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(new AltTransformer(pathCompiler), op);
+    }
+
+    private static class AltTransformer extends TransformCopy {
+        private PathCompiler pathCompiler;
+
+        public AltTransformer(PathCompiler pathCompiler) {
+            this.pathCompiler = pathCompiler;
+        }
+
+        @Override
+        public Op transform(OpPath opPath) {
+            TriplePath triplePath = opPath.getTriplePath();
+            Path path = triplePath.getPath();
+            if (path instanceof P_Alt) {
+                Node s = triplePath.getSubject();

Review Comment:
   So I don't believe this transformation is semantically safe in all cases.
   
   At a minimum it needs to consider whether either end of the path is a blank node variable as pushing those into a union will make them unique variables relative to the outside of the union and change the semantics of the query.
   
   i.e. a query like `SELECT * WHERE { ?s ex:foo / (ex:a | ex:b) ?o }` can have the first part of the path rewritten to yield `SELECT * WHERE { ?s ex:foo _:bnode . _:bnode (ex:a | ex:b) ?o }`, if you then rewrite the 2nd part into a union you change the semantics -> `SELECT * WHERE { ?s ex:foo _:bnode . { _:bnode ex:a ?o . } UNION { _:bnode ex:b ?o } }`
   
   From the [SPARQL 1.1 specification](https://www.w3.org/TR/sparql11-query/#QSynBlankNodes):
   
   > Blank nodes are indicated by either the label form, such as "_:abc", or the abbreviated form "[]". A blank node that is used in only one place in the query syntax can be indicated with []. A unique blank node will be used to form the triple pattern. Blank node labels are written as "_:abc" for a blank node with label "abc". The same blank node label cannot be used in two different basic graph patterns in the same query.
   
   So that's technically an illegal query you've generated. Evaluation wise those two `_:bnode` variables are now in different scopes and so rather than joining the results of the BGP and the Union you'll be generating a cross-product
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
afs commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1023271866


##########
jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java:
##########
@@ -53,9 +58,60 @@ public TransformPathFlattern(PathCompiler pathCompiler)
     @Override
     public Op transform(OpPath opPath)
     {
-        // Flatten down to triples where possible.
-        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath()) ;
-        // Any generated paths of exactly one to triple; convert to Op.
-        return PathLib.pathToTriples(pattern) ;
+        Op op = transformReduce(opPath, pathCompiler);
+        return op;
     }
-}
+
+    static Op transformReduce(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(

Review Comment:
   I'm confused - this is only called with an `OpPath` as the first argument.



##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   `QueryIterRepeatApply` if you want cancellation at that granularity.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] rvesse commented on pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
rvesse commented on PR #1616:
URL: https://github.com/apache/jena/pull/1616#issuecomment-1317272563

   Per @afs earlier comment there is actually an alternative path transform already in the code base that already covered more complex path flattening such as alternatives.
   
   I've opened PR #1620 to make it possible to toggle that transform on with the existing optimiser via an ARQ symbol leaving the existing simple transform unchanged.  That way users can opt into the more extensive path flattening if they wish but the existing default behaviour remains.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] SimonBin commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
SimonBin commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1023316226


##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   would that have any advantage over or be significantly differnt from QueryIterPlainWrapper(Iter.flatMap)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] SimonBin commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
SimonBin commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1022784175


##########
jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java:
##########
@@ -53,9 +58,60 @@ public TransformPathFlattern(PathCompiler pathCompiler)
     @Override
     public Op transform(OpPath opPath)
     {
-        // Flatten down to triples where possible.
-        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath()) ;
-        // Any generated paths of exactly one to triple; convert to Op.
-        return PathLib.pathToTriples(pattern) ;
+        Op op = transformReduce(opPath, pathCompiler);
+        return op;
     }
-}
+
+    static Op transformReduce(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(
+                new TransformCopy() {
+                    @Override
+                    public Op transform(OpPath opPath) {
+                        // Flatten down to triples where possible.
+                        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath());
+                        // Any generated paths of exactly one to triple; convert to Op.
+                        Op op = PathLib.pathToTriples(pattern);
+                        return transformAlt(op, pathCompiler);
+                    }
+                },
+                op);
+    }
+
+    static Op transformAlt(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(new AltTransformer(pathCompiler), op);
+    }
+
+    private static class AltTransformer extends TransformCopy {
+        private PathCompiler pathCompiler;
+
+        public AltTransformer(PathCompiler pathCompiler) {
+            this.pathCompiler = pathCompiler;
+        }
+
+        @Override
+        public Op transform(OpPath opPath) {
+            TriplePath triplePath = opPath.getTriplePath();
+            Path path = triplePath.getPath();
+            if (path instanceof P_Alt) {
+                Node s = triplePath.getSubject();

Review Comment:
   Thank you for the thorough analysis and concern. In my testing, I got the expected results when querying e.g. ` { ?s ex:foo _:bnode . _:bnode (ex:a | ex:b) ?o }` as you suggested. Do you have any additional pointers how to verify the safety of the transform?
   
   Furthermore, the PathCompiler only adds OpSequences and the AltTransformer only adds OpUnions, so I think there is no possibility for a join to arise.
   
   (OpAsQuery would indeed generate the invalid `{ _:b0  <http://example.org/a>  ?o }
       UNION
         { _:b0  <http://example.org/b>  ?o }` but I think we should understand that the path syntax is more powerful that what could be possible to express on the SPARQL UNION syntax level)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] rvesse commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
rvesse commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1022649485


##########
jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java:
##########
@@ -53,9 +58,60 @@ public TransformPathFlattern(PathCompiler pathCompiler)
     @Override
     public Op transform(OpPath opPath)
     {
-        // Flatten down to triples where possible.
-        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath()) ;
-        // Any generated paths of exactly one to triple; convert to Op.
-        return PathLib.pathToTriples(pattern) ;
+        Op op = transformReduce(opPath, pathCompiler);
+        return op;
     }
-}
+
+    static Op transformReduce(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(
+                new TransformCopy() {
+                    @Override
+                    public Op transform(OpPath opPath) {
+                        // Flatten down to triples where possible.
+                        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath());
+                        // Any generated paths of exactly one to triple; convert to Op.
+                        Op op = PathLib.pathToTriples(pattern);
+                        return transformAlt(op, pathCompiler);
+                    }
+                },
+                op);
+    }
+
+    static Op transformAlt(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(new AltTransformer(pathCompiler), op);
+    }
+
+    private static class AltTransformer extends TransformCopy {
+        private PathCompiler pathCompiler;
+
+        public AltTransformer(PathCompiler pathCompiler) {
+            this.pathCompiler = pathCompiler;
+        }
+
+        @Override
+        public Op transform(OpPath opPath) {
+            TriplePath triplePath = opPath.getTriplePath();
+            Path path = triplePath.getPath();
+            if (path instanceof P_Alt) {
+                Node s = triplePath.getSubject();

Review Comment:
   I was using the SPARQL syntax to illustrate the issue but this applies at the algebra nonetheless.
   
   This transform is happening after any scope renaming has happened.  If rewriting an alt to a `union` operator that's also introducing a `join` into the algebra which if there's a blank node variable involved breaks the semantics AFAICT



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
Aklakan commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1023314216


##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   You can rearrange the code in `PathLib` along the lines of first turning the Nodes to Bindings, and then wrap that as a QueryIterator which serves as input to QueryIterRepeatApply:
   
   ```java
   Iterator<Node> itNode = determineUngroundedStartingSet(graph, path, execCxt) ;
   Iterator<Binding> itBinding = Iter.iter(itNode).map(n -> BindingFactory.binding(binding, sVar, n));
   QueryIterator lhs = QueryIterPlainWrapper.wrap(itBinding);
   QueryIterator combined = new QueryIterRepeatApply(lhs, execCxt) {
       QueryIterator nextStage(Binding b2) {
           Iterator<Node> pathIter = PathEval.eval(graph, b2.get(sVar), path, getExecCxt().getContext());
           return evalGroundedOneEnd(b2, pathIter, oVar, getExecCxt());
       }
   };
   ```
   The machinery should cleanly delegate all the close calls provided that `Iterator`s are closeable (using `Iter`).
   That said - I realize I don't fully understand the difference between cancel() and close() - doesn't close imply cancel?
   Because `Iter` doesn't have this distinction - so what does the gap between `Iter` and `QueryIter` imply?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] SimonBin commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
SimonBin commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1022784175


##########
jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java:
##########
@@ -53,9 +58,60 @@ public TransformPathFlattern(PathCompiler pathCompiler)
     @Override
     public Op transform(OpPath opPath)
     {
-        // Flatten down to triples where possible.
-        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath()) ;
-        // Any generated paths of exactly one to triple; convert to Op.
-        return PathLib.pathToTriples(pattern) ;
+        Op op = transformReduce(opPath, pathCompiler);
+        return op;
     }
-}
+
+    static Op transformReduce(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(
+                new TransformCopy() {
+                    @Override
+                    public Op transform(OpPath opPath) {
+                        // Flatten down to triples where possible.
+                        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath());
+                        // Any generated paths of exactly one to triple; convert to Op.
+                        Op op = PathLib.pathToTriples(pattern);
+                        return transformAlt(op, pathCompiler);
+                    }
+                },
+                op);
+    }
+
+    static Op transformAlt(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(new AltTransformer(pathCompiler), op);
+    }
+
+    private static class AltTransformer extends TransformCopy {
+        private PathCompiler pathCompiler;
+
+        public AltTransformer(PathCompiler pathCompiler) {
+            this.pathCompiler = pathCompiler;
+        }
+
+        @Override
+        public Op transform(OpPath opPath) {
+            TriplePath triplePath = opPath.getTriplePath();
+            Path path = triplePath.getPath();
+            if (path instanceof P_Alt) {
+                Node s = triplePath.getSubject();

Review Comment:
   Thank you for the thorough analysis and concern. In my testing, I got the expected results when querying e.g. ` { ?s ex:foo _:bnode . _:bnode (ex:a | ex:b) ?o }` as you suggested. Do you have any additional pointers how to verify the safety of the transform?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
afs commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1024546098


##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   > the difference between cancel() and close() 
   
   `cancel` is async e.g. query timeout. `close` is called from creating thread a a controlled point.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
afs commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1023731701


##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   > You can rearrange the code in `PathLib`
   
   There is also `QueryIterExtendByVar`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] LorenzBuehmann commented on pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
LorenzBuehmann commented on PR #1616:
URL: https://github.com/apache/jena/pull/1616#issuecomment-1316590923

   > * It might be better to restrict optimization to the case of the specific and important case of `(property | property)`, not general path. This is mentioned on [JENA-2325](https://issues.apache.org/jira/browse/JENA-2325).
   
   I would be fine with this - currently only `p+` and `p*` are handled smarter, but very common usages of `p|q` or even `p|^p` are not covered at all, leading to worst case scenario with an unbound triple pattern by gathering as seed nodes all subjects and objects in the graph and collecting those in a set. Note, we're more and more working on larger dataset like Wikidata, thus, it's definitely something that won't work at all in that case.
   We also tried with at least handling basic cases of p_alt only contain p_oneormore, p_link or inverted p_links, this would cover probably most real world cases. It looks a bit clumsy still and maybe bringing the p_alt to some kind of normal form first would make it easier. @Aklakan also suggested to build an NFA, but I think we should keep it as simply as possible, we only need the first hop (or last to get objects) in the automaton to have seeds nodes. The rest of the evaluation I'd keep as is.  
   
   Also as a comment, the PR addressed multiple things afaik:
   
   - consider `LIMIT X` due to fully working on iterators, thus, being as lazy as possible
   - usage of QueryIterators such that the query execution context is used to be aware of query termination when a timeout happened
   - handling of p_alt


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on pull request #1616: calculate ungrounded path starting set lazily

Posted by GitBox <gi...@apache.org>.
afs commented on PR #1616:
URL: https://github.com/apache/jena/pull/1616#issuecomment-1327302357

   Hi - could you squash the "AFS" commit into the  main one. I don't want my name in the commit log (the commit was originally an example!).
   
   Re: PathEval 
   Unlikely for the 4.7.0. Fortunately, that's a separable piece of work.
   
   I have some code for step 2 covering better starting sets of all the path operators I can see how to do.
   I believe optimizing these is more important.
   
   It will however require testing on larger data. I hope you group can do that for the cases you(plural) have raised.
   
   The objective is "better" not "perfect".
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] SimonBin commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
SimonBin commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1022784175


##########
jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java:
##########
@@ -53,9 +58,60 @@ public TransformPathFlattern(PathCompiler pathCompiler)
     @Override
     public Op transform(OpPath opPath)
     {
-        // Flatten down to triples where possible.
-        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath()) ;
-        // Any generated paths of exactly one to triple; convert to Op.
-        return PathLib.pathToTriples(pattern) ;
+        Op op = transformReduce(opPath, pathCompiler);
+        return op;
     }
-}
+
+    static Op transformReduce(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(
+                new TransformCopy() {
+                    @Override
+                    public Op transform(OpPath opPath) {
+                        // Flatten down to triples where possible.
+                        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath());
+                        // Any generated paths of exactly one to triple; convert to Op.
+                        Op op = PathLib.pathToTriples(pattern);
+                        return transformAlt(op, pathCompiler);
+                    }
+                },
+                op);
+    }
+
+    static Op transformAlt(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(new AltTransformer(pathCompiler), op);
+    }
+
+    private static class AltTransformer extends TransformCopy {
+        private PathCompiler pathCompiler;
+
+        public AltTransformer(PathCompiler pathCompiler) {
+            this.pathCompiler = pathCompiler;
+        }
+
+        @Override
+        public Op transform(OpPath opPath) {
+            TriplePath triplePath = opPath.getTriplePath();
+            Path path = triplePath.getPath();
+            if (path instanceof P_Alt) {
+                Node s = triplePath.getSubject();

Review Comment:
   Thank you for the thorough analysis and concern. In my testing, I got the expected results when querying e.g. ` { ?s ex:foo _:bnode . _:bnode (ex:a | ex:b) ?o }` as you suggested. Do you have any additional pointers how to verify the safety of the transform?
   
   (OpAsQuery would indeed generate the invalid `{ _:b0  <http://example.org/a>  ?o }
       UNION
         { _:b0  <http://example.org/b>  ?o }` but I think we should understand that the path syntax is more powerful that what could be possible to express on the SPARQL UNION syntax level)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] SimonBin commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
SimonBin commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1022787834


##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   the QueryIterConcat can be used to make a materialised list of many QueryIterators. In this case, the (nested) QueryIterators themselves should be constructed lazily, that's why I didn't see a good way to make use of the existing class 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
afs commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1023152201


##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   Isn't that then a flatmap? 
   `QueryIterRepeatApply` provides flatmap for `QueryIterator`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
Aklakan commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1023314216


##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   You can rearrange the code in `PathLib` along the lines of first turning the Nodes to Bindings, and then wrap that as a QueryIterator which serves as input to QueryIterRepeatApply:
   
   ```java
   Iterator<Node> itNode = determineUngroundedStartingSet(graph, path, execCxt) ;
   Iterator<Binding> itBinding = Iter.iter(itNode).map(n -> BindingFactory.binding(binding, sVar, n));
   QueryIterator lhs = QueryIterPlainWrapper.wrap(itBinding);
   QueryIterator combined = new QueryIterRepeatApply(lhs, execCxt) {
       QueryIterator nextStage(Binding b2) {
           Iterator<Node> pathIter = PathEval.eval(graph, b2.get(sVar), path, getExecCxt().getContext());
           return evalGroundedOneEnd(b2, pathIter, oVar, getExecCxt());
       }
   };
   ```
   
   I realize I don't fully understand the difference between cancel() and close() - doesn't close imply cancel?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
Aklakan commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1023314216


##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   You can rearrange the code in `PathLib` along the lines of first turning the Nodes to Bindings, and then wrap that as a QueryIterator which serves as input to QueryIterRepeatApply:
   
   ```java
   Iterator<Node> itNode = determineUngroundedStartingSet(graph, path, execCxt) ;
   Iterator<Binding> itBinding = Iter.iter(itNode).map(n -> BindingFactory.binding(binding, sVar, n));
   QueryIterator lhs = QueryIterPlainWrapper.wrap(itBinding);
   QueryIterator combined = new QueryIterRepeatApply(lhs) {
       QueryIterator nextStage(Binding b2, execCxt) {
           Iterator<Node> pathIter = PathEval.eval(graph, b2.get(sVar), path, getExecCxt().getContext());
           return evalGroundedOneEnd(b2, pathIter, oVar, getExecCxt());
       }
   };
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
Aklakan commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1023314216


##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   You can rearrange the code in `PathLib` along the lines of first turning the Nodes to Bindings, and then wrap that as a QueryIterator which serves as input to QueryIterRepeatApply:
   
   ```java
   Iterator<Node> itNode = determineUngroundedStartingSet(graph, path, execCxt) ;
   Iterator<Binding> itBinding = Iter.iter(itNode).map(n -> BindingFactory.binding(binding, sVar, n));
   QueryIterator lhs = QueryIterPlainWrapper.wrap(itBinding);
   QueryIterator combined = new QueryIterRepeatApply(lhs) {
       QueryIterator nextStage(Binding b2) {
           Iterator<Node> pathIter = PathEval.eval(graph, b2.get(sVar), path, execCxt.getContext());
           return evalGroundedOneEnd(b2, pathIter, oVar, execCxt);
       }
   };
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] namedgraph commented on pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
namedgraph commented on PR #1616:
URL: https://github.com/apache/jena/pull/1616#issuecomment-1314393589

   Was `TransformPathFlattern` meant to be called `TransformPathFlattener`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] LorenzBuehmann commented on pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
LorenzBuehmann commented on PR #1616:
URL: https://github.com/apache/jena/pull/1616#issuecomment-1316798508

   > Part 1 of #1614 is also about a lack of reordering across paths and that isn't in this PR.
   
   True, but a query with only a single triple pattern having an alt_p would still benefit from smarter handling of basic property path cases to gather seed nodes. that was my point, we could easily add handle those without changing any other code but add more lines to method `PathLib::determineUngroundedStartingSet`
   
   It would be a huge performance gain already:
   Wikidata
   ```sparql
   SELECT * {
   ?s wd:P10564|wd:P4496 ?industry_code
   }
   ```
   where both properties are used just in 1746 and 997 triples compared to millions of subjects and objects being gathered and then evaluated otherwise 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
Aklakan commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1022619507


##########
jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java:
##########
@@ -53,9 +58,60 @@ public TransformPathFlattern(PathCompiler pathCompiler)
     @Override
     public Op transform(OpPath opPath)
     {
-        // Flatten down to triples where possible.
-        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath()) ;
-        // Any generated paths of exactly one to triple; convert to Op.
-        return PathLib.pathToTriples(pattern) ;
+        Op op = transformReduce(opPath, pathCompiler);
+        return op;
     }
-}
+
+    static Op transformReduce(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(
+                new TransformCopy() {
+                    @Override
+                    public Op transform(OpPath opPath) {
+                        // Flatten down to triples where possible.
+                        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath());
+                        // Any generated paths of exactly one to triple; convert to Op.
+                        Op op = PathLib.pathToTriples(pattern);
+                        return transformAlt(op, pathCompiler);
+                    }
+                },
+                op);
+    }
+
+    static Op transformAlt(Op op, PathCompiler pathCompiler) {
+        return Transformer.transform(new AltTransformer(pathCompiler), op);
+    }
+
+    private static class AltTransformer extends TransformCopy {
+        private PathCompiler pathCompiler;
+
+        public AltTransformer(PathCompiler pathCompiler) {
+            this.pathCompiler = pathCompiler;
+        }
+
+        @Override
+        public Op transform(OpPath opPath) {
+            TriplePath triplePath = opPath.getTriplePath();
+            Path path = triplePath.getPath();
+            if (path instanceof P_Alt) {
+                Node s = triplePath.getSubject();

Review Comment:
   Hm, the transform happens on the algebra level so all bnodes should have been renamed to variables with appropriate scoping - right?
   Also, I guess the transform is not correctly reversible with `OpAsQuery` but at this point in the optimizer it doesn't have to be.
   On the syntax level the transformation would certainly suffer from the issue which you mentioned.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
afs commented on PR #1616:
URL: https://github.com/apache/jena/pull/1616#issuecomment-1326480335

   @SimonBin - do you have time to return this PR into step 1 of #1629? (that is, remove the `TransformPathFlattern`, squash to one commit). Then step 2 can be based on code after the step 1 change.
   
   If you don't have time, say so and I'll put in a new PR starting from the diff of this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] SimonBin commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
SimonBin commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1024298446


##########
jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java:
##########
@@ -53,9 +58,53 @@ public TransformPathFlattern(PathCompiler pathCompiler)
     @Override
     public Op transform(OpPath opPath)
     {
+        Op op = transformReduce(opPath, pathCompiler);
+        return op;
+    }
+
+    static Op transformReduce(OpPath opPath, PathCompiler pathCompiler) {
         // Flatten down to triples where possible.
-        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath()) ;
+        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath());
         // Any generated paths of exactly one to triple; convert to Op.
-        return PathLib.pathToTriples(pattern) ;
+        Op op = PathLib.pathToTriples(pattern);
+        return transformAlt(op, pathCompiler);

Review Comment:
   the suggestion of this change is as follows: we leave the original pathCompiler unchanged (`transformReduce`/pathCompiler.reduce) -- it can only flatten a path to PathBlocks and does not handle P_Alt. Then, we add another Transformer (transformAlt/AltTransformer) which only transforms any P_Alt to OpUnion



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] rvesse commented on pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
rvesse commented on PR #1616:
URL: https://github.com/apache/jena/pull/1616#issuecomment-1318422052

   > my concern would be that this completely different path flatterning in #1620 has never been put to wide testing and per the comment in the source file,
   
   And my point is that neither is yours.  You're changing the default behaviour for all existing users based on testing on your own datasets.
   
   As the recent flurry of issues around another new transform that was introduced has shown optimisations that benefit one dataset don't necessarily benefit all, and ultimately led to it being disabled by default (#1604).  Hence the suggestion of enabling an opt-in to the alternative transform rather than changing the existing default.
   
   > `It does not produce very nice execution structures so ARQ uses a functional equivalent, but different, transformation.`. One obvious instance from your tests, `:p1{2}` now becomes a join instead of a simple bgp (although I guess the other optimisers will turn that back into a sequence which maybe isn't quite as bad?)
   
   Yes the full optimiser converts joins of adjacent BGPs or triple operators back into a single BGP, expanded my test cases to explicitly cover this now.
   
   > 
   > Another thought about Std should be given w.r.t inverse paths like `^:p1+`, they are reversed by the current optimiser but remain in reverse in the Std implementation. This might then cause worse behaviour in the determineUngroundedStartingSet which does not handle them at the moment afaik (fall-back to allNodes)
   
   Well if both ends of the path are ungrounded whether we reverse it or not is irrelevant, and regardless we're conflating the issue of deficiencies in the path evaluator with algebra transformation.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
afs commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1024555652


##########
jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterConcatIterator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.engine.iterator;
+
+import org.apache.jena.atlas.io.IndentedWriter;
+import org.apache.jena.atlas.lib.Lib;
+import org.apache.jena.sparql.engine.ExecutionContext;
+import org.apache.jena.sparql.engine.QueryIterator;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.serializer.SerializationContext;
+
+import java.util.*;
+
+
+/**
+ * A query iterator that joins two or more iterators into a single iterator. */ 
+
+public class QueryIterConcatIterator extends QueryIter

Review Comment:
   https://gist.github.com/afs/cb8e4d806d4efd9239682ec9220f936f



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on a diff in pull request #1616: improve jena performance and behaviour in certain path cases

Posted by GitBox <gi...@apache.org>.
afs commented on code in PR #1616:
URL: https://github.com/apache/jena/pull/1616#discussion_r1025550633


##########
jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java:
##########
@@ -53,9 +58,53 @@ public TransformPathFlattern(PathCompiler pathCompiler)
     @Override
     public Op transform(OpPath opPath)
     {
+        Op op = transformReduce(opPath, pathCompiler);
+        return op;
+    }
+
+    static Op transformReduce(OpPath opPath, PathCompiler pathCompiler) {
         // Flatten down to triples where possible.
-        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath()) ;
+        PathBlock pattern = pathCompiler.reduce(opPath.getTriplePath());
         // Any generated paths of exactly one to triple; convert to Op.
-        return PathLib.pathToTriples(pattern) ;
+        Op op = PathLib.pathToTriples(pattern);
+        return transformAlt(op, pathCompiler);

Review Comment:
   > we leave the original pathCompiler unchanged
   
   I don't see that as an advantage or disadvantage.
   
   I am concerned about a build up of technical debt where the number of layers of functionality grows with implicit assumptions and dependencies on the previous layers layers - each step, it is the easiest solution point-wise but it never reflects on the overall effect which is overall complex code.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org