You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "Boxuan Li (Jira)" <ji...@apache.org> on 2022/06/16 02:05:00 UTC
[jira] [Commented] (TINKERPOP-2753) Create noop() step to avoid eager optimization

    [ https://issues.apache.org/jira/browse/TINKERPOP-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554861#comment-17554861 ] 

Boxuan Li commented on TINKERPOP-2753:
--------------------------------------

> Do you find that {{noop()}} is different from {{identity()}} in terms of what it does?

Yeah, I think they will basically be the same, except I would assume the `noop()` step cannot be used as a terminal step. The semantics are the only notable differences between `noop` and `identity`.

What you offered is indeed a workable solution. In fact, I think it has the same semantic meaning as our current workaround, `map(x -> x.get())`. I would still consider it as a workaround, though. But yeah, if the use case I described is rare, and especially if no other graph provider has the same problem, then it might not be worth adding a new step.

> Create noop() step to avoid eager optimization
> ----------------------------------------------
>
>                 Key: TINKERPOP-2753
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2753
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.6.0
>            Reporter: Boxuan Li
>            Priority: Major
>
> I only have experience in JanusGraph, so my opinion might be biased and this proposal might not be generalizable to other graph providers:
> I propose we create a `noop()` step that does nothing. It is a special step that simply provides a hint for the graph provider. How to interpret it depends on the graph provider, but the usage in my mind is to avoid eager optimization. Sometimes a graph provider can combine different filter steps into a joint condition for better index selection or predicate pushdown. For example, in the query below:
>  
> {code:java}
> g.V().has("name", "bob").has("age", 20){code}
>  
> JanusGraph will fold the two `has` conditions into a joint condition for better index selection. Sometimes, however, users don't want this "eager optimization", likely because they know the distribution of data and prefer doing in-memory filtering for the second `has` condition. They could do this:
>  
> {code:java}
> g.V().has("name", "bob").map(x -> x.get()).has("age", 20){code}
>  
> So that JanusGraph will defer the evaluation of the second condition until the first `has` condition is evaluated. Here, the `map(x -> x.get())` is essentially a noop step. What I am proposing is to use an official `noop()` step to replace this workaround. This `noop` step sounds like a `barrier` step but they do not have the same semantics. The `noop` step is a barrier against constraint look-ahead optimization.
>  
> Another example usage of `noop` is as follows:
>  
> {code:java}
> g.V(ids).bothE("follows").noop().where(__.otherV().is(v2)).next(){code}
>  
> In the above case, we can use `noop` to force the graph provider to compute `bothE` first and then evaluate `where` statement. Otherwise, the graph provider (for example, JanusGraph) might try folding the `where` condition into the `bothE` step for predicate pushdown. Predicate pushdown usually works, but in some scenarios, it is less preferred.
>  
> I am happy to provide a patch if the community likes this idea.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)