You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "Stephen Mallette (Jira)" <ji...@apache.org> on 2022/06/16 10:40:00 UTC

[jira] [Closed] (TINKERPOP-2753) Create noop() step to avoid eager optimization

     [ https://issues.apache.org/jira/browse/TINKERPOP-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen Mallette closed TINKERPOP-2753.
---------------------------------------
    Resolution: Won't Do

>  except I would assume the `noop()` step cannot be used as a terminal step.

{{identity()}}} is not a terminal step so it seems they are identical.

> I would still consider it as a workaround, though. 

I think that if you're smart enough about your graph and Gremlin to know when to short-circuit an optimization, you should probably know how strategies affect the behavior of your traversal. Removing them to achieve some gain that they won't bring is just part of writing Gremlin in that case. In that sense, it feels like less of a workaround to me but perhaps it's because I've used and recommended this approach before to solve this issue. glad this approach works for you.

> Create noop() step to avoid eager optimization
> ----------------------------------------------
>
>                 Key: TINKERPOP-2753
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2753
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.6.0
>            Reporter: Boxuan Li
>            Priority: Major
>
> I only have experience in JanusGraph, so my opinion might be biased and this proposal might not be generalizable to other graph providers:
> I propose we create a `noop()` step that does nothing. It is a special step that simply provides a hint for the graph provider. How to interpret it depends on the graph provider, but the usage in my mind is to avoid eager optimization. Sometimes a graph provider can combine different filter steps into a joint condition for better index selection or predicate pushdown. For example, in the query below:
>  
> {code:java}
> g.V().has("name", "bob").has("age", 20){code}
>  
> JanusGraph will fold the two `has` conditions into a joint condition for better index selection. Sometimes, however, users don't want this "eager optimization", likely because they know the distribution of data and prefer doing in-memory filtering for the second `has` condition. They could do this:
>  
> {code:java}
> g.V().has("name", "bob").map(x -> x.get()).has("age", 20){code}
>  
> So that JanusGraph will defer the evaluation of the second condition until the first `has` condition is evaluated. Here, the `map(x -> x.get())` is essentially a noop step. What I am proposing is to use an official `noop()` step to replace this workaround. This `noop` step sounds like a `barrier` step but they do not have the same semantics. The `noop` step is a barrier against constraint look-ahead optimization.
>  
> Another example usage of `noop` is as follows:
>  
> {code:java}
> g.V(ids).bothE("follows").noop().where(__.otherV().is(v2)).next(){code}
>  
> In the above case, we can use `noop` to force the graph provider to compute `bothE` first and then evaluate `where` statement. Otherwise, the graph provider (for example, JanusGraph) might try folding the `where` condition into the `bothE` step for predicate pushdown. Predicate pushdown usually works, but in some scenarios, it is less preferred.
>  
> I am happy to provide a patch if the community likes this idea.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)