You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@jena.apache.org by GitBox <gi...@apache.org> on 2022/11/10 09:13:43 UTC

[GitHub] [jena] LorenzBuehmann opened a new issue, #1614: Property path handling in query optimizer and timeout handler

LorenzBuehmann opened a new issue, #1614:
URL: https://github.com/apache/jena/issues/1614

   ### Version
   
   4.7.0-SNAPSHOT
   
   ### Question
   
   Hi,
   
   - Jena 4.7.0-SNAPSHOT
   - TDB2
   - Fuseki
   
   ### Part 1
   
   We deployed Wikidata truthy dump and got a query which runs "forever" (probably):
   
   ``` sparql
   PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
   PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
   PREFIX wikibase: <http://wikiba.se/ontology#>
   PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   
   SELECT ?item ?itemLabel WHERE {
     ?item (rdfs:label|skos:altLabel) ?itemLabel;
       rdf:type wikibase:Property.
     FILTER(CONTAINS(LCASE(?itemLabel), "border"@en))
   }
   LIMIT 10
   ```
   There are in total 7 results (when the reordered query terminates).
   
   You can see the usage of a property path in the first triple pattern. It looks like the BGP is never reordered here? We also computed the TDB stats:
   
   |         Pattern                     |         Count      |
   |------------------------------|---------------:|
   | `rdfs:label`                 | 510 188 728 |
   | `skos:altLabel`              | 102 554 842 |
   | `rdf:type wikibase:Property` | 9003        |
   
    the size of the second triple pattern would be `9000`, so fairly small.
   
   Reordering the triple pattern indeed helps, but the user would have to know this. I can imagine property path query result estimation is difficult ...
   
   ### Part 2
   With the same query we recognized that the query execution timeout is never considered with that query, so it looks like it's in a branch where query termination isn't checked for maybe when calling `GraphUtils.allNodes(graph)` in `PathLib` L245 ??
   
   relevant part of JStack dump:
   
   ```
   "qtp123458189-28" #28 prio=5 os_prio=0 cpu=45409.83ms elapsed=198.44s tid=0x00007f3831342000 nid=0x3b runnable  [0x00007f1eacefa000]
      java.lang.Thread.State: RUNNABLE
   	at java.io.RandomAccessFile.seek0(java.base@11.0.16.1/Native Method)
   	at java.io.RandomAccessFile.seek(java.base@11.0.16.1/RandomAccessFile.java:591)
   	at org.apache.jena.dboe.base.file.BinaryDataFileRandomAccess.seek(BinaryDataFileRandomAccess.java:95)
   	at org.apache.jena.dboe.base.file.BinaryDataFileRandomAccess.read(BinaryDataFileRandomAccess.java:71)
   	at org.apache.jena.dboe.base.file.BinaryDataFileWriteBuffered.read(BinaryDataFileWriteBuffered.java:121)
   	- locked <0x00007f202a7f73f8> (a java.lang.Object)
   	at org.apache.jena.dboe.trans.data.TransBinaryDataFile.read(TransBinaryDataFile.java:197)
   	at org.apache.jena.tdb2.store.nodetable.TReadAppendFileTransport.read(TReadAppendFileTransport.java:74)
   	at org.apache.thrift.transport.TTransport.readAll(TTransport.java:100)
   	at org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:622)
   	at org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:522)
   	at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:247)
   	at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:227)
   	at org.apache.thrift.TUnion.read(TUnion.java:145)
   	at org.apache.jena.tdb2.store.nodetable.NodeTableTRDF.readNodeFromTable(NodeTableTRDF.java:82)
   	at org.apache.jena.tdb2.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:102)
   	- locked <0x00007f203af76558> (a org.apache.jena.tdb2.store.nodetable.NodeTableTRDF)
   	at org.apache.jena.tdb2.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:52)
   	at org.apache.jena.tdb2.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:208)
   	- locked <0x00007f203af6b168> (a java.lang.Object)
   	at org.apache.jena.tdb2.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:133)
   	at org.apache.jena.tdb2.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:52)
   	at org.apache.jena.tdb2.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:65)
   	at org.apache.jena.tdb2.lib.TupleLib.triple(TupleLib.java:77)
   	at org.apache.jena.tdb2.lib.TupleLib.triple(TupleLib.java:66)
   	at org.apache.jena.tdb2.lib.TupleLib.lambda$convertToTriples$2(TupleLib.java:48)
   	at org.apache.jena.tdb2.lib.TupleLib$$Lambda$643/0x00007f1e464db4b0.apply(Unknown Source)
   	at org.apache.jena.atlas.iterator.Iter$IterMap.next(Iter.java:417)
   	at org.apache.jena.atlas.iterator.IteratorWrapper.next(IteratorWrapper.java:41)
   	at org.apache.jena.dboe.transaction.txn.IteratorTxnTracker.next(IteratorTxnTracker.java:39)
   	at org.apache.jena.atlas.iterator.Iter$IterMap.next(Iter.java:417)
   	at org.apache.jena.atlas.iterator.Iter$IterMap.next(Iter.java:417)
   	at org.apache.jena.atlas.iterator.Iter.next(Iter.java:1109)
   	at org.apache.jena.util.iterator.WrappedIterator.next(WrappedIterator.java:107)
   	at org.apache.jena.sparql.util.graph.GraphUtils.allNodes(GraphUtils.java:240)
   	at org.apache.jena.sparql.path.PathLib.determineUngroundedStartingSet(PathLib.java:245)
   	at org.apache.jena.sparql.path.PathLib.execUngroundedPath(PathLib.java:182)
   	at org.apache.jena.sparql.path.PathLib.execTriplePath(PathLib.java:128)
   	at org.apache.jena.sparql.path.PathLib.execTriplePath(PathLib.java:108)
   	at org.apache.jena.sparql.engine.iterator.QueryIterPath.nextStage(QueryIterPath.java:47)
   	at org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.makeNextStage(QueryIterRepeatApply.java:100)
   	at org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:60)
   	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116)
   	at org.apache.jena.sparql.engine.main.iterator.QueryIterGraph$QueryIterGraphInner.hasNextBinding(QueryIterGraph.java:121)
   	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116)
   	at org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:69)
   	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116)
   	at org.apache.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding(QueryIterProcessBinding.java:77)
   	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116)
   	at org.apache.jena.tdb2.solver.OpExecutorTDB2.optimizeExecuteQuads(OpExecutorTDB2.java:227)
   	at org.apache.jena.tdb2.solver.OpExecutorTDB2.execute(OpExecutorTDB2.java:164)
   	at org.apache.jena.sparql.engine.main.ExecutionDispatch.visit(ExecutionDispatch.java:66)
   	at org.apache.jena.sparql.algebra.op.OpQuadPattern.visit(OpQuadPattern.java:87)
   	at org.apache.jena.sparql.engine.main.ExecutionDispatch.exec(ExecutionDispatch.java:46)
   	at org.apache.jena.sparql.engine.main.OpExecutor.exec(OpExecutor.java:119)
   	at org.apache.jena.tdb2.solver.OpExecutorTDB2.exec(OpExecutorTDB2.java:87)
   	at org.apache.jena.sparql.engine.main.OpExecutor.execute(OpExecutor.java:230)
   	at org.apache.jena.sparql.engine.main.ExecutionDispatch.visit(ExecutionDispatch.java:130)
   	at org.apache.jena.sparql.algebra.op.OpSequence.visit(OpSequence.java:75)
   	at org.apache.jena.sparql.engine.main.ExecutionDispatch.exec(ExecutionDispatch.java:46)
   	at org.apache.jena.sparql.engine.main.OpExecutor.exec(OpExecutor.java:119)
   	at org.apache.jena.tdb2.solver.OpExecutorTDB2.exec(OpExecutorTDB2.java:87)
   	at org.apache.jena.sparql.engine.main.OpExecutor.execute(OpExecutor.java:391)
   	at org.apache.jena.sparql.engine.main.ExecutionDispatch.visit(ExecutionDispatch.java:267)
   	at org.apache.jena.sparql.algebra.op.OpProject.visit(OpProject.java:47)
   	at org.apache.jena.sparql.engine.main.ExecutionDispatch.exec(ExecutionDispatch.java:46)
   	at org.apache.jena.sparql.engine.main.OpExecutor.exec(OpExecutor.java:119)
   	at org.apache.jena.tdb2.solver.OpExecutorTDB2.exec(OpExecutorTDB2.java:87)
   	at org.apache.jena.sparql.engine.main.OpExecutor.execute(OpExecutor.java:401)
   	at org.apache.jena.sparql.engine.main.ExecutionDispatch.visit(ExecutionDispatch.java:307)
   	at org.apache.jena.sparql.algebra.op.OpSlice.visit(OpSlice.java:50)
   	at org.apache.jena.sparql.engine.main.ExecutionDispatch.exec(ExecutionDispatch.java:46)
   	at org.apache.jena.sparql.engine.main.OpExecutor.exec(OpExecutor.java:119)
   	at org.apache.jena.tdb2.solver.OpExecutorTDB2.exec(OpExecutorTDB2.java:87)
   	at org.apache.jena.sparql.engine.main.OpExecutor.execute(OpExecutor.java:90)
   	at org.apache.jena.sparql.engine.main.QC.execute(QC.java:53)
   	at org.apache.jena.sparql.engine.main.QueryEngineMain.eval(QueryEngineMain.java:55)
   	at org.apache.jena.tdb2.solver.QueryEngineTDB.eval(QueryEngineTDB.java:108)
   	at org.apache.jena.sparql.engine.QueryEngineBase.evaluate(QueryEngineBase.java:171)
   	at org.apache.jena.sparql.engine.QueryEngineBase.createPlan(QueryEngineBase.java:130)
   	at org.apache.jena.sparql.engine.QueryEngineBase.getPlan(QueryEngineBase.java:112)
   	at org.apache.jena.tdb2.solver.QueryEngineTDB$QueryEngineFactoryTDB.create(QueryEngineTDB.java:138)
   	at org.apache.jena.sparql.engine.QueryEngineFactoryWrapper.create(QueryEngineFactoryWrapper.java:49)
   	at org.apache.jena.sparql.exec.QueryExecDataset.getPlan(QueryExecDataset.java:531)
   	at org.apache.jena.sparql.exec.QueryExecDataset.startQueryIterator(QueryExecDataset.java:494)
   	at org.apache.jena.sparql.exec.QueryExecDataset.execute(QueryExecDataset.java:173)
   	at org.apache.jena.sparql.exec.QueryExecDataset.select(QueryExecDataset.java:167)
   	at org.apache.jena.sparql.exec.QueryExecutionAdapter.execSelect(QueryExecutionAdapter.java:115)
   	at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.executeQuery(SPARQLQueryProcessor.java:374)
   	at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execute(SPARQLQueryProcessor.java:279)
   	at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.executeWithParameter(SPARQLQueryProcessor.java:224)
   	at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execute(SPARQLQueryProcessor.java:209)
   	at org.apache.jena.fuseki.servlets.ActionService.executeLifecycle(ActionService.java:58)
   	at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execPost(SPARQLQueryProcessor.java:84)
   	at org.apache.jena.fuseki.servlets.ActionProcessor.process(ActionProcessor.java:34)
   	at org.apache.jena.fuseki.servlets.ActionBase.process(ActionBase.java:54)
   	at org.apache.jena.fuseki.servlets.ActionExecLib.execActionSub(ActionExecLib.java:124)
   	at org.apache.jena.fuseki.servlets.ActionExecLib.execAction(ActionExecLib.java:98)
   	at org.apache.jena.fuseki.server.Dispatcher.dispatchAction(Dispatcher.java:164)
   	at org.apache.jena.fuseki.server.Dispatcher.process(Dispatcher.java:156)
   	at org.apache.jena.fuseki.server.Dispatcher.dispatch(Dispatcher.java:83)
   	at org.apache.jena.fuseki.servlets.FusekiFilter.doFilter(FusekiFilter.java:48)
   	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:202)
   	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1635)
   	at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:61)
   	at org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108)
   	at org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137)
   	at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:154)
   	at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66)
   	at org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108)
   	at org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137)
   	at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:154)
   	at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66)
   	at org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:458)
   	at org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:373)
   	at org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)
   	at org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)
   	at org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:387)
   	at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:370)
   	at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:154)
   	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:202)
   	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1635)
   	at org.apache.jena.fuseki.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:344)
   	at org.apache.jena.fuseki.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:298)
   	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:210)
   	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1635)
   	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:527)
   	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
   	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578)
   	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
   	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
   	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1571)
   	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221)
   	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1383)
   	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176)
   	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:484)
   	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1544)
   	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174)
   	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1305)
   	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)
   	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
   	at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:822)
   	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
   	at org.eclipse.jetty.server.Server.handle(Server.java:563)
   	at org.eclipse.jetty.server.HttpChannel.lambda$handle$0(HttpChannel.java:505)
   	at org.eclipse.jetty.server.HttpChannel$$Lambda$768/0x00007f1d7cda2440.dispatch(Unknown Source)
   	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:762)
   	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:497)
   	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:282)
   	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314)
   	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
   	at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
   	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:421)
   	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:390)
   	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:277)
   	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produce(AdaptiveExecutionStrategy.java:199)
   	at org.eclipse.jetty.io.ManagedSelector$$Lambda$725/0x00007f1d7d757058.run(Unknown Source)
   	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:933)
   	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1077)
   	at java.lang.Thread.run(java.base@11.0.16.1/Thread.java:829)
   ``` 
   
   
   Let me if we can provide more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs commented on issue #1614: Property path handling in query optimizer and timeout handler

Posted by GitBox <gi...@apache.org>.
afs commented on issue #1614:
URL: https://github.com/apache/jena/issues/1614#issuecomment-1327383351

   Continued on GH-1629.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs commented on issue #1614: Property path handling in query optimizer and timeout handler

Posted by GitBox <gi...@apache.org>.
afs commented on issue #1614:
URL: https://github.com/apache/jena/issues/1614#issuecomment-1310237230

   > I guess, UNION clauses can't also be optimized
   
   The filter is being pushed in - there isn't any reorder of rdf:type (which is usually assumed to be risky choice).
   
   ```
       (sequence
           (union
             (filter (contains (lcase ?itemLabel) "border"@en)
               (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?item <http://www.w3.org/2000/01/rdf-schema#label> ?itemLabel)))
             (filter (contains (lcase ?itemLabel) "border"@en)
               (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?item <http://www.w3.org/2004/02/skos/core#altLabel> ?itemLabel))))
           (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?item <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://wikiba.se/ontology#Property>)))))
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs commented on issue #1614: Property path handling in query optimizer and timeout handler

Posted by GitBox <gi...@apache.org>.
afs commented on issue #1614:
URL: https://github.com/apache/jena/issues/1614#issuecomment-1310233335

   > in https://github.com/apache/jena/blob/main/jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java#L35 there is a comment that P_Alt is not converted into OpUnion, but do you know why? What was the downside of this particular optimisation?
   
   No idea.
   
   It might be a bad idea if the expression either side was complex.
   
   Seems reasonable if the LHS and RHS are simple.
   
   [JENA-2325](https://issues.apache.org/jira/browse/JENA-2325)
   May be: [JENA-1300](https://issues.apache.org/jira/browse/JENA-1300)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs commented on issue #1614: Property path handling in query optimizer and timeout handler

Posted by GitBox <gi...@apache.org>.
afs commented on issue #1614:
URL: https://github.com/apache/jena/issues/1614#issuecomment-1310761363

   > Without having looked into this in detail, maybe it's possible to mitigate this aspect of the raised issues by using a QueryIterDistinct.
   
   Should be possible.
   
   And possibly properly push `QueryIterator` usage into `PathEvaluator` which is normal iterator centric.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs commented on issue #1614: Property path handling in query optimizer and timeout handler

Posted by GitBox <gi...@apache.org>.
afs commented on issue #1614:
URL: https://github.com/apache/jena/issues/1614#issuecomment-1310140269

   > Jena 4.7.0-SNAPSHOT
   
   Which date of 4.7.0-SNAPSHOT.
   Did other versions of Jena work?
   
   It might be the reordering -- there again it also does not seem to convert the path alternatives to a a union.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] LorenzBuehmann commented on issue #1614: Property path handling in query optimizer and timeout handler

Posted by GitBox <gi...@apache.org>.
LorenzBuehmann commented on issue #1614:
URL: https://github.com/apache/jena/issues/1614#issuecomment-1310230675

   ### Jena version
   ```
   tdb2.tdbquery --version                                                                                           
   Jena:       VERSION: 4.7.0-SNAPSHOT
   Jena:       BUILD_DATE: 2022-11-10T12:32:30Z
   ```
   
   ### Property path query
   ```
   tdb2.tdbquery --loc /data/coypu/tdb2/wikidata --explain "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
   PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
   PREFIX wikibase: <http://wikiba.se/ontology#>
   PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   SELECT ?item ?itemLabel WHERE {
     ?item rdfs:label|skos:altLabel ?itemLabel ;                             
            rdf:type wikibase:Property.
     FILTER(CONTAINS(LCASE(?itemLabel), 'border'@en))
   }
   LIMIT 10"
   12:42:50 INFO  exec            :: QUERY
     PREFIX  skos: <http://www.w3.org/2004/02/skos/core#>
     PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
     PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
     PREFIX  wikibase: <http://wikiba.se/ontology#>
     
     SELECT  ?item ?itemLabel
     WHERE
       { ?item rdfs:label|skos:altLabel ?itemLabel .
         ?item  rdf:type  wikibase:Property
         FILTER contains(lcase(?itemLabel), "border"@en)
       }
     LIMIT   10
   12:42:50 INFO  exec            :: ALGEBRA
     (slice _ 10
       (project (?item ?itemLabel)
         (sequence
           (filter (contains (lcase ?itemLabel) "border"@en)
             (graph <urn:x-arq:DefaultGraphNode>
               (path ?item (alt <http://www.w3.org/2000/01/rdf-schema#label> <http://www.w3.org/2004/02/skos/core#altLabel>) ?itemLabel)))
           (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?item <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://wikiba.se/ontology#Property>)))))
   12:42:50 INFO  exec            :: TDB2
     (slice _ 10
       (project (?item ?itemLabel)
         (sequence
           (filter (contains (lcase ?itemLabel) "border"@en)
             (graph <urn:x-arq:DefaultGraphNode>
               (path ?item (alt <http://www.w3.org/2000/01/rdf-schema#label> <http://www.w3.org/2004/02/skos/core#altLabel>) ?itemLabel)))
           (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?item <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://wikiba.se/ontology#Property>)))))
   12:42:50 INFO  exec            :: TDB2
     (path ?item (alt <http://www.w3.org/2000/01/rdf-schema#label> <http://www.w3.org/2004/02/skos/core#altLabel>) ?itemLabel)
   12:42:50 INFO  exec            :: Path :: ?item <http://www.w3.org/2000/01/rdf-schema#label>|<http://www.w3.org/2004/02/skos/core#altLabel> ?itemLabel
   ```
   
   ### Union query
   I guess, `UNION` clauses can't also be optimized, right? For the rewritten query it would be trivial to transform it, but this is also not reordered I guess? For those tiny queries one could indeed optimize and estimate the size of the `UNION` (`|A| + |B|`), but seems to be always a tradeoff  - for bigger BGPs in the `UNION` it's getting more complex.
   ```
   tdb2.tdbquery --loc /data/coypu/tdb2/wikidata --explain "
   PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
   PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
   PREFIX wikibase: <http://wikiba.se/ontology#>
   PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   SELECT ?item ?itemLabel WHERE {
     {?item rdfs:label ?itemLabel } UNION {?item skos:altLabel ?itemLabel;}
     ?item  rdf:type wikibase:Property.
     FILTER(CONTAINS(LCASE(?itemLabel), 'border'@en))
   }
   LIMIT 10"
   12:37:15 INFO  exec            :: QUERY
     PREFIX  skos: <http://www.w3.org/2004/02/skos/core#>
     PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
     PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
     PREFIX  wikibase: <http://wikiba.se/ontology#>
     
     SELECT  ?item ?itemLabel
     WHERE
       {   { ?item  rdfs:label  ?itemLabel }
         UNION
           { ?item  skos:altLabel  ?itemLabel }
         ?item  rdf:type  wikibase:Property
         FILTER contains(lcase(?itemLabel), "border"@en)
       }
     LIMIT   10
   12:37:15 INFO  exec            :: ALGEBRA
     (slice _ 10
       (project (?item ?itemLabel)
         (sequence
           (union
             (filter (contains (lcase ?itemLabel) "border"@en)
               (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?item <http://www.w3.org/2000/01/rdf-schema#label> ?itemLabel)))
             (filter (contains (lcase ?itemLabel) "border"@en)
               (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?item <http://www.w3.org/2004/02/skos/core#altLabel> ?itemLabel))))
           (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?item <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://wikiba.se/ontology#Property>)))))
   12:37:15 INFO  exec            :: TDB2
     (slice _ 10
       (project (?item ?itemLabel)
         (sequence
           (union
             (filter (contains (lcase ?itemLabel) "border"@en)
               (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?item <http://www.w3.org/2000/01/rdf-schema#label> ?itemLabel)))
             (filter (contains (lcase ?itemLabel) "border"@en)
               (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?item <http://www.w3.org/2004/02/skos/core#altLabel> ?itemLabel))))
           (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?item <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://wikiba.se/ontology#Property>)))))
   12:37:15 INFO  exec            :: TDB2
     (filter (contains (lcase ?itemLabel) "border"@en)
       (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?item <http://www.w3.org/2000/01/rdf-schema#label> ?itemLabel)))
   12:37:15 INFO  exec            :: Execute ::   ?item rdfs:label ?itemLabel
   12:37:15 INFO  exec            :: TDB2
     (filter (contains (lcase ?itemLabel) "border"@en)
       (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?item <http://www.w3.org/2004/02/skos/core#altLabel> ?itemLabel)))
   12:37:15 INFO  exec            :: Execute ::   ?item <http://www.w3.org/2004/02/skos/core#altLabel> ?itemLabel
   12:37:28 INFO  exec            :: Execute ::   ?item rdf:type <http://wikiba.se/ontology#Property>
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] SimonBin commented on issue #1614: Property path handling in query optimizer and timeout handler

Posted by GitBox <gi...@apache.org>.
SimonBin commented on issue #1614:
URL: https://github.com/apache/jena/issues/1614#issuecomment-1310172264

   in https://github.com/apache/jena/blob/main/jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/TransformPathFlattern.java#L35 there is a comment that P_Alt is not converted into OpUnion, but do you know why? What was the downside of this particular optimisation? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs closed issue #1614: Property path handling in query optimizer and timeout handler

Posted by GitBox <gi...@apache.org>.
afs closed issue #1614: Property path handling in query optimizer and timeout handler
URL: https://github.com/apache/jena/issues/1614


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] Aklakan commented on issue #1614: Property path handling in query optimizer and timeout handler

Posted by GitBox <gi...@apache.org>.
Aklakan commented on issue #1614:
URL: https://github.com/apache/jena/issues/1614#issuecomment-1310277065

   Just a comment on this stacktrace snippet: @LorenzBuehmann mentioned that `GraphUtils.allNodes` attempts to load everything into in memory thereby ignoring query timeouts. Without having looked into this in detail, maybe its possible to mitigate this issue by using a `QueryIterDistinct`.
   ```
   at org.apache.jena.sparql.util.graph.GraphUtils.allNodes(GraphUtils.java:240)
   	at org.apache.jena.sparql.path.PathLib.determineUngroundedStartingSet(PathLib.java:245)
   	at org.apache.jena.sparql.path.PathLib.execUngroundedPath(PathLib.java:182)
   	at org.apache.jena.sparql.path.PathLib.execTriplePath(PathLib.java:128)
   	at org.apache.jena.sparql.path.PathLib.execTriplePath(PathLib.java:108)
   	at org.apache.jena.sparql.engine.iterator.QueryIterPath.nextStage(QueryIterPath.java:47)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org