You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pr@jena.apache.org by GitBox <gi...@apache.org> on 2022/11/16 08:29:20 UTC

[GitHub] [jena] LorenzBuehmann commented on pull request #1616: improve jena performance and behaviour in certain path cases

LorenzBuehmann commented on PR #1616:
URL: https://github.com/apache/jena/pull/1616#issuecomment-1316590923

   > * It might be better to restrict optimization to the case of the specific and important case of `(property | property)`, not general path. This is mentioned on [JENA-2325](https://issues.apache.org/jira/browse/JENA-2325).
   
   I would be fine with this - currently only `p+` and `p*` are handled smarter, but very common usages of `p|q` or even `p|^p` are not covered at all, leading to worst case scenario with an unbound triple pattern by gathering as seed nodes all subjects and objects in the graph and collecting those in a set. Note, we're more and more working on larger dataset like Wikidata, thus, it's definitely something that won't work at all in that case.
   We also tried with at least handling basic cases of p_alt only contain p_oneormore, p_link or inverted p_links, this would cover probably most real world cases. It looks a bit clumsy still and maybe bringing the p_alt to some kind of normal form first would make it easier. @Aklakan also suggested to build an NFA, but I think we should keep it as simply as possible, we only need the first hop (or last to get objects) in the automaton to have seeds nodes. The rest of the evaluation I'd keep as is.  
   
   Also as a comment, the PR addressed multiple things afaik:
   
   - consider `LIMIT X` due to fully working on iterators, thus, being as lazy as possible
   - usage of QueryIterators such that the query execution context is used to be aware of query termination when a timeout happened
   - handling of p_alt


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org