You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Yuzhao Chen <yu...@gmail.com> on 2019/04/22 12:13:51 UTC

How to traverse RelNode’s parent conviniently?

Now for RelNode, we have method getInput()[1]  to fetch the input RelNodes, but how we fetch the parent ?

For example, we have plan:

      join-rel
    /             \
scan1     scan2


We can get scan1 and scan2 in join-rel directly with  method getInput, but how can we get the join rel in scan1 and scan 2 ?

I know that there is a RelShuttle that can visit every RelNode and if I make a cache for the inputs mapping, finally I can get the ‘parents’ from the cache, but this is boring code and not that intuitive.

Do you guys have any good ideas ?

[1] https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelNode.java#L132


Best,
Danny Chan

Re: How to traverse RelNode’s parent conviniently?

Posted by Stamatis Zampetakis <za...@gmail.com>.

This is a very interesting topic, thanks for starting this discussion Danny!

Metadata seems a very reasonable way to handle hints and I guess the
proposal by Julian is the way to go.

For the sake of discussion (brainstorming), I was thinking that rules from
hints do not differ too much.
The final goal is to allow end-users to affect the planning process for a
given query towards a particular plan. I guess this could be achieved by
registering custom rules that are taken into account by the planner (and
can be possibly ignored).

I admit that passing hints alongside the SQL query is convenient for
end-users so most likely we cannot avoid changing the grammar. Following
the rules idea, I was wondering if instead of attaching hints to the
SqlNode/RelNode(s) as metadata or so we could rather
enable/disable/generate rules for the planner.

A small example below:

/* +EnumerableHashJoinRule(e1,d1) */
SELECT e1.name
FROM emp e1
INNER JOIN dept d1 ON e1.did=d1.id
WHERE d1.name = 'HR'

The rule matches when the operator is a Join.class and left child is
aliased to e1 and right child is aliased to d1; it transforms the
LogicalJoin to an EnumerableHashJoin.

I haven't thought a lot on this but maybe this hint/rule based approach
gives some inspiration to somebody in this list.

Best,
Stamatis

On Sat, Apr 27, 2019 at 1:40 AM Yuzhao Chen <yu...@gmail.com> wrote:

> Thx Julian
>
> Mostly got your idea, but one thing needs to confirm:
>
> Now the MetadataHandler is kind of query lazy the cache is code-gen ed in
> the handler class, the metadata also propagate from inputs, when I got an
> RelNode’s hint, how can I cache it in the metadata handler for querying ?
>
> Best,
> Danny Chan
> 在 2019年4月27日 +0800 AM6:57，Julian Hyde <jh...@apache.org>，写道：
> > The RelMetadata system is designed for these kinds of annotations - if
> there is a “global hints cache” there’s no benefit to doing it outside the
> RelMetadata system.
> >
> > That said, I don’t know (and I don’t think anyone knows) how we want
> hints to be propagated as we generate RelNodes from RelNodes. I think we
> should focus on really simple cases first (e.g. hints about the whole
> query, or about particular table scans), and not try to automatically
> propagate them.
> >
> > We can make the hints propagation mechanism more sophisticated when we
> have an actual use case to drive us.
> >
> > Julian
> >
> >
> > > On Apr 26, 2019, at 3:41 PM, Yuzhao Chen <yu...@gmail.com> wrote:
> > >
> > > Thx, Julian
> > >
> > > Let me repeat my thoughts about the details again, in order to
> implement the hints, maybe these things are needed:
> > >
> > > The main diff is that we will maintain a global hints cache
> > > 1. Supports hints grammar for parser.jj
> > > 2. During/after sql-to-rel, we may pass a hints cache to the
> SqlToRelConverter, there is a visitor to setup/init the RelNodes’hints to
> the cache once at a time, this cache scope is global and would be active
> the whole query planning time. The cache only keep hints for few top nodes
> that really needs
> > > 3. In the Planner, add set/get hints cache method, so that in the
> planning rules, we can see the hints cache,
> > > And we can also ban some rule matching in the planner
> > > 4. Hook the RelOptCall#transformTo method to handle logic of hints
> propagating(invoke the hints logic again same as sql-to-rel phrase), this
> will also update the global hints cache
> > > It seems that given the global hints cache, we do not need the
> MetaDataHandler any more, this is the thing I most want to make sure.
> > > Hope for your suggestions.
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年4月25日 +0800 AM3:07，Julian Hyde <jh...@apache.org>，写道：
> > > > I think it’s OK to attach hints to the (few) RelNodes that come out
> of the SqlToRelConverter.
> > > >
> > > > But it would be a mistake to try to propagate those hints to all of
> the RelNodes that are created during query planning. Even if we changed all
> of the copy methods (a huge task) there are many other ways that RelNodes
> get created. We would end up with a RelNode graph with lots of hints, and
> most of those hints would be inaccurate or not applicable.
> > > >
> > > > For a particular hint, say "/*+ nohashjoin */“, some piece of code
> would need to look at the initial RelNode tree and take its own action:
> say, build a data structure to be used by planner rules, or enable or
> disable planner rules.
> > > >
> > > >
> > > > > On Apr 23, 2019, at 9:31 PM, Chunwei Lei <ch...@gmail.com>
> wrote:
> > > > >
> > > > > Thanks Danny.
> > > > >
> > > > > Those are good points. I think it depends on what we consider hint
> as.
> > > > > IMHO, if we consider hint as a kind of metadata,
> > > > > it is not a good idea to store the hints in the RelNode instance.
> > > > >
> > > > >
> > > > >
> > > > > Best,
> > > > > Chunwei
> > > > >
> > > > > On Wed, Apr 24, 2019 at 11:09 AM Yuzhao Chen <yu...@gmail.com>
> wrote:
> > > > > >
> > > > > > Thx, Julian
> > > > > >
> > > > > > I think the hint path is a good way for searching RelNode’s
> parents, broadly, there may be these modules/things need to be modified:
> > > > > >
> > > > > > 1. Supports hints grammar for parser.jj
> > > > > > 2. Cache the hints in the RelNode instance, and add method like
> RelNode#getHints() to fetch all the hints inherited for this node.
> > > > > > 3. Modify #copy method for every kind of RelNode so that the
> hints can be copied when creating new equivalent nodes.
> > > > > > 4. Add a visitor in after sql-to-rel phrase, to set up full
> hints list for every children RelNode if there exists any.
> > > > > > 5. Add hints metadata handler and handles the hints fetching and
> overriding for specific kind of RelNode
> > > > > >
> > > > > > The 2 and 3 are the modifications that i really want to confirm,
> that is, shall we store the hints in the RelNode instance ?
> > > > > >
> > > > > > These are initial thoughts and if we make agreement, I would
> output a detail design doc which contains:
> > > > > >
> > > > > > 1. The hints grammar supported for the major sql engines
> > > > > > 2. The hints grammar supported for Apache Calcite
> > > > > > 3. The interface and design ideas of the proposed modifications
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Danny Chan
> > > > > > 在 2019年4月24日 +0800 AM3:04，Julian Hyde <jh...@apache.org>，写道：
> > > > > > > I see that if you have a hint on, say, the root node then it
> would be nice for its child or grand-child to be able to see that hint.
> > > > > > >
> > > > > > > How about giving each hint an inherit path? Thus given
> > > > > > >
> > > > > > > Filter Hint1
> > > > > > > +- Join
> > > > > > > +- Scan
> > > > > > > +- Project Hint2
> > > > > > > +- Scan
> > > > > > >
> > > > > > >
> > > > > > > Filter would have hints {Hint1[]}
> > > > > > > Join would have hints {Hint1[0]}
> > > > > > > Scan would have hints {Hint1[0, 0]}
> > > > > > > Project would have hints {Hint1[0,1], Hint2}
> > > > > > > Scan2 would have hints {[Hint1[0, 0, 1, 0], Hint2[0]}
> > > > > > >
> > > > > > > You could populate the hints and inherit paths with a single
> visitor pass after sql-to-rel conversion.
> > > > > > >
> > > > > > > By the way, I still like the idea of having kinds as a kind of
> RelMetadata, but I realize that a given RelNode might have more than one
> hint. So I think that the getHints(RelNode) method would return a
> List<Hint>, with Hint as follows:
> > > > > > >
> > > > > > > class Hint {
> > > > > > > public final List<Integer> inheritPath; // immutable, not null
> > > > > > > public final String type; // not null
> > > > > > > public final Object operand; // immutable, may be null, must
> be JSON data
> > > > > > > }
> > > > > > >
> > > > > > > operand must be JSON-style data (null, boolean, number,
> String, immutable List of JSON data, or immutable order-preserving Map from
> String to JSON data).
> > > > > > >
> > > > > > > > On Apr 23, 2019, at 1:25 AM, Yuzhao Chen <
> yuzhao.cyz@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Thx, Andrew
> > > > > > > >
> > > > > > > > I don’t want to have a custom RelNode class, I hope all the
> work about hints would be contributed to the community. I want to find an
> acceptable way to keep and propagate the hints if we use the
> MetadataHandler to cache and query the hints.
> > > > > > > >
> > > > > > > > I don’t think the hints should be mixed into the cost model,
> that would make the cost computation very complex and hard to maintain, we
> only need the hints in our planning phrase to give suggestions, hints is
> more like another guideline for me and transparent to the planner.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Danny Chan
> > > > > > > > 在 2019年4月23日 +0800 PM2:24，Андрей Цвелодуб <
> a.tsvelodub@gmail.com>，写道：
> > > > > > > > > Hi Danny,
> > > > > > > > >
> > > > > > > > > I would also agree with Julian on his position. I've tried
> to get around
> > > > > > > > > this limitation in several different ways, but none of it
> ended well :)
> > > > > > > > >
> > > > > > > > > For your idea with hints, if you have custom RelNode
> classes, you can add
> > > > > > > > > hint as an additional field of the class and you can write
> a simple rule
> > > > > > > > > that propagates the hint downwards, step by step. And also
> include the hint
> > > > > > > > > in your cost estimation, so that nodes with hints would be
> more attractive
> > > > > > > > > to the planner. I'm not sure this would be the most
> correct way to use the
> > > > > > > > > cost mechanism, but at least it is straightforward and it
> works.
> > > > > > > > >
> > > > > > > > > Best Regards,
> > > > > > > > > Andrew Tsvelodub
> > > > > > > > >
> > > > > > > > > On Tue, 23 Apr 2019 at 08:44, Yuzhao Chen <
> yuzhao.cyz@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > Julian,
> > > > > > > > > >
> > > > > > > > > > I want to add hint support for Calcite, the initial idea
> was to tag a
> > > > > > > > > > RelNode(transformed from a SqlNode with hint) with a hit
> attribute(or
> > > > > > > > > > trait), then I hope that the children (inputs) of it can
> see this hint, so
> > > > > > > > > > to make some decisions if it should consume or propagate
> the hint.
> > > > > > > > > >
> > > > > > > > > > The problem I got here is the trait propagate from
> inputs from, which is
> > > > > > > > > > the opposite as what I need, can you give some
> suggestions ? If I use
> > > > > > > > > > MetadataHandler to cache and propagate the hints, how to
> propagate from
> > > > > > > > > > parents to children ?
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Danny Chan
> > > > > > > > > > 在 2019年4月23日 +0800 AM3:14，Julian Hyde <jhyde@apache.org
> >，写道：
> > > > > > > > > > > TL;DR: RelNodes don’t really have parents. Be careful
> if you are relying
> > > > > > > > > > on the parent concept too much. Rely on rules instead.
> > > > > > > > > > >
> > > > > > > > > > > In the Volcano model, a RelNode doesn’t really have a
> parent. It might
> > > > > > > > > > be used in several places. (RelSet has a field
> ‘List<RelNode> parents’ that
> > > > > > > > > > is kept up to date as planing progresses. But it’s
> really for Volcano’s
> > > > > > > > > > internal use.)
> > > > > > > > > > >
> > > > > > > > > > > Even if you are not using Volcano, there are reasons
> to want the RelNode
> > > > > > > > > > graph to be a dag, so again, a RelNode doesn’t have a
> unique parent.
> > > > > > > > > > >
> > > > > > > > > > > RelShuttleImpl has a stack. You can use that to find
> the parent. But the
> > > > > > > > > > “parent” is just “where we came from as we traversed the
> RelNode graph”.
> > > > > > > > > > There may be other “parents” that you do not know about.
> > > > > > > > > > >
> > > > > > > > > > > If you have a Project and want to find all parents
> that are Filters,
> > > > > > > > > > don’t even think about “iterating over the parents” of
> the Project. Just
> > > > > > > > > > write a rule that matches a Filter on a Project, and
> trust Volcano to do
> > > > > > > > > > its job.
> > > > > > > > > > >
> > > > > > > > > > > Julian
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > On Apr 22, 2019, at 6:15 AM, Yuzhao Chen <
> yuzhao.cyz@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Thx, Stamatis, that somehow make sense, if i pass
> around the parent
> > > > > > > > > > node every time I visit a RelNode and keep the parents
> in the cache, but it
> > > > > > > > > > is still not that intuitive. Actually I what a to add a
> new RelTrait which
> > > > > > > > > > bind to a specific scope, for example:
> > > > > > > > > > > >
> > > > > > > > > > > > join-rel(trait1)
> > > > > > > > > > > > / \
> > > > > > > > > > > > join2 join3
> > > > > > > > > > > >
> > > > > > > > > > > > Join-rel has a trait trait1, and I want all the
> children of join-rel
> > > > > > > > > > can see this trait, with Calcite’s default metadata
> handler, I can only see
> > > > > > > > > > the trait from children nodes(traits propagate from the
> inputs), and I have
> > > > > > > > > > no idea how to propagate a trait reversely?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Danny Chan
> > > > > > > > > > > > 在 2019年4月22日 +0800 PM8:44，Stamatis Zampetakis <
> zabetak@gmail.com>，写道：
> > > > > > > > > > > > > Hi Danny,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Apart from RelShuttle there is also RelVisitor
> which has a visit
> > > > > > > > > > method
> > > > > > > > > > > > > that provides the parent [1]. Not sure, if it
> suits your needs.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Stamatis
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1]
> > > > > > > > > > > > >
> > > > > > > > > >
> https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelVisitor.java#L43
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Apr 22, 2019 at 2:14 PM Yuzhao Chen <
> yuzhao.cyz@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Now for RelNode, we have method getInput()[1] to
> fetch the input
> > > > > > > > > > > > > > RelNodes, but how we fetch the parent ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > For example, we have plan:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > join-rel
> > > > > > > > > > > > > > / \
> > > > > > > > > > > > > > scan1 scan2
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > We can get scan1 and scan2 in join-rel directly
> with method
> > > > > > > > > > getInput, but
> > > > > > > > > > > > > > how can we get the join rel in scan1 and scan 2 ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I know that there is a RelShuttle that can visit
> every RelNode and
> > > > > > > > > > if I
> > > > > > > > > > > > > > make a cache for the inputs mapping, finally I
> can get the
> > > > > > > > > > ‘parents’ from
> > > > > > > > > > > > > > the cache, but this is boring code and not that
> intuitive.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Do you guys have any good ideas ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > >
> > > > > > > > > >
> https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelNode.java#L132
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > Danny Chan
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > >
> >
>

Re: How to traverse RelNode’s parent conviniently?

Posted by Yuzhao Chen <yu...@gmail.com>.

Thx Julian

Mostly got your idea, but one thing needs to confirm:

Now the MetadataHandler is kind of query lazy the cache is code-gen ed in the handler class, the metadata also propagate from inputs, when I got an RelNode’s hint, how can I cache it in the metadata handler for querying ?

Best,
Danny Chan
在 2019年4月27日 +0800 AM6:57，Julian Hyde <jh...@apache.org>，写道：
> The RelMetadata system is designed for these kinds of annotations - if there is a “global hints cache” there’s no benefit to doing it outside the RelMetadata system.
>
> That said, I don’t know (and I don’t think anyone knows) how we want hints to be propagated as we generate RelNodes from RelNodes. I think we should focus on really simple cases first (e.g. hints about the whole query, or about particular table scans), and not try to automatically propagate them.
>
> We can make the hints propagation mechanism more sophisticated when we have an actual use case to drive us.
>
> Julian
>
>
> > On Apr 26, 2019, at 3:41 PM, Yuzhao Chen <yu...@gmail.com> wrote:
> >
> > Thx, Julian
> >
> > Let me repeat my thoughts about the details again, in order to implement the hints, maybe these things are needed:
> >
> > The main diff is that we will maintain a global hints cache
> > 1. Supports hints grammar for parser.jj
> > 2. During/after sql-to-rel, we may pass a hints cache to the SqlToRelConverter, there is a visitor to setup/init the RelNodes’hints to the cache once at a time, this cache scope is global and would be active the whole query planning time. The cache only keep hints for few top nodes that really needs
> > 3. In the Planner, add set/get hints cache method, so that in the planning rules, we can see the hints cache,
> > And we can also ban some rule matching in the planner
> > 4. Hook the RelOptCall#transformTo method to handle logic of hints propagating(invoke the hints logic again same as sql-to-rel phrase), this will also update the global hints cache
> > It seems that given the global hints cache, we do not need the MetaDataHandler any more, this is the thing I most want to make sure.
> > Hope for your suggestions.
> >
> > Best,
> > Danny Chan
> > 在 2019年4月25日 +0800 AM3:07，Julian Hyde <jh...@apache.org>，写道：
> > > I think it’s OK to attach hints to the (few) RelNodes that come out of the SqlToRelConverter.
> > >
> > > But it would be a mistake to try to propagate those hints to all of the RelNodes that are created during query planning. Even if we changed all of the copy methods (a huge task) there are many other ways that RelNodes get created. We would end up with a RelNode graph with lots of hints, and most of those hints would be inaccurate or not applicable.
> > >
> > > For a particular hint, say "/*+ nohashjoin */“, some piece of code would need to look at the initial RelNode tree and take its own action: say, build a data structure to be used by planner rules, or enable or disable planner rules.
> > >
> > >
> > > > On Apr 23, 2019, at 9:31 PM, Chunwei Lei <ch...@gmail.com> wrote:
> > > >
> > > > Thanks Danny.
> > > >
> > > > Those are good points. I think it depends on what we consider hint as.
> > > > IMHO, if we consider hint as a kind of metadata,
> > > > it is not a good idea to store the hints in the RelNode instance.
> > > >
> > > >
> > > >
> > > > Best,
> > > > Chunwei
> > > >
> > > > On Wed, Apr 24, 2019 at 11:09 AM Yuzhao Chen <yu...@gmail.com> wrote:
> > > > >
> > > > > Thx, Julian
> > > > >
> > > > > I think the hint path is a good way for searching RelNode’s parents, broadly, there may be these modules/things need to be modified:
> > > > >
> > > > > 1. Supports hints grammar for parser.jj
> > > > > 2. Cache the hints in the RelNode instance, and add method like RelNode#getHints() to fetch all the hints inherited for this node.
> > > > > 3. Modify #copy method for every kind of RelNode so that the hints can be copied when creating new equivalent nodes.
> > > > > 4. Add a visitor in after sql-to-rel phrase, to set up full hints list for every children RelNode if there exists any.
> > > > > 5. Add hints metadata handler and handles the hints fetching and overriding for specific kind of RelNode
> > > > >
> > > > > The 2 and 3 are the modifications that i really want to confirm, that is, shall we store the hints in the RelNode instance ?
> > > > >
> > > > > These are initial thoughts and if we make agreement, I would output a detail design doc which contains:
> > > > >
> > > > > 1. The hints grammar supported for the major sql engines
> > > > > 2. The hints grammar supported for Apache Calcite
> > > > > 3. The interface and design ideas of the proposed modifications
> > > > >
> > > > >
> > > > > Best,
> > > > > Danny Chan
> > > > > 在 2019年4月24日 +0800 AM3:04，Julian Hyde <jh...@apache.org>，写道：
> > > > > > I see that if you have a hint on, say, the root node then it would be nice for its child or grand-child to be able to see that hint.
> > > > > >
> > > > > > How about giving each hint an inherit path? Thus given
> > > > > >
> > > > > > Filter Hint1
> > > > > > +- Join
> > > > > > +- Scan
> > > > > > +- Project Hint2
> > > > > > +- Scan
> > > > > >
> > > > > >
> > > > > > Filter would have hints {Hint1[]}
> > > > > > Join would have hints {Hint1[0]}
> > > > > > Scan would have hints {Hint1[0, 0]}
> > > > > > Project would have hints {Hint1[0,1], Hint2}
> > > > > > Scan2 would have hints {[Hint1[0, 0, 1, 0], Hint2[0]}
> > > > > >
> > > > > > You could populate the hints and inherit paths with a single visitor pass after sql-to-rel conversion.
> > > > > >
> > > > > > By the way, I still like the idea of having kinds as a kind of RelMetadata, but I realize that a given RelNode might have more than one hint. So I think that the getHints(RelNode) method would return a List<Hint>, with Hint as follows:
> > > > > >
> > > > > > class Hint {
> > > > > > public final List<Integer> inheritPath; // immutable, not null
> > > > > > public final String type; // not null
> > > > > > public final Object operand; // immutable, may be null, must be JSON data
> > > > > > }
> > > > > >
> > > > > > operand must be JSON-style data (null, boolean, number, String, immutable List of JSON data, or immutable order-preserving Map from String to JSON data).
> > > > > >
> > > > > > > On Apr 23, 2019, at 1:25 AM, Yuzhao Chen <yu...@gmail.com> wrote:
> > > > > > >
> > > > > > > Thx, Andrew
> > > > > > >
> > > > > > > I don’t want to have a custom RelNode class, I hope all the work about hints would be contributed to the community. I want to find an acceptable way to keep and propagate the hints if we use the MetadataHandler to cache and query the hints.
> > > > > > >
> > > > > > > I don’t think the hints should be mixed into the cost model, that would make the cost computation very complex and hard to maintain, we only need the hints in our planning phrase to give suggestions, hints is more like another guideline for me and transparent to the planner.
> > > > > > >
> > > > > > > Best,
> > > > > > > Danny Chan
> > > > > > > 在 2019年4月23日 +0800 PM2:24，Андрей Цвелодуб <a....@gmail.com>，写道：
> > > > > > > > Hi Danny,
> > > > > > > >
> > > > > > > > I would also agree with Julian on his position. I've tried to get around
> > > > > > > > this limitation in several different ways, but none of it ended well :)
> > > > > > > >
> > > > > > > > For your idea with hints, if you have custom RelNode classes, you can add
> > > > > > > > hint as an additional field of the class and you can write a simple rule
> > > > > > > > that propagates the hint downwards, step by step. And also include the hint
> > > > > > > > in your cost estimation, so that nodes with hints would be more attractive
> > > > > > > > to the planner. I'm not sure this would be the most correct way to use the
> > > > > > > > cost mechanism, but at least it is straightforward and it works.
> > > > > > > >
> > > > > > > > Best Regards,
> > > > > > > > Andrew Tsvelodub
> > > > > > > >
> > > > > > > > On Tue, 23 Apr 2019 at 08:44, Yuzhao Chen <yu...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Julian,
> > > > > > > > >
> > > > > > > > > I want to add hint support for Calcite, the initial idea was to tag a
> > > > > > > > > RelNode(transformed from a SqlNode with hint) with a hit attribute(or
> > > > > > > > > trait), then I hope that the children (inputs) of it can see this hint, so
> > > > > > > > > to make some decisions if it should consume or propagate the hint.
> > > > > > > > >
> > > > > > > > > The problem I got here is the trait propagate from inputs from, which is
> > > > > > > > > the opposite as what I need, can you give some suggestions ? If I use
> > > > > > > > > MetadataHandler to cache and propagate the hints, how to propagate from
> > > > > > > > > parents to children ?
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Danny Chan
> > > > > > > > > 在 2019年4月23日 +0800 AM3:14，Julian Hyde <jh...@apache.org>，写道：
> > > > > > > > > > TL;DR: RelNodes don’t really have parents. Be careful if you are relying
> > > > > > > > > on the parent concept too much. Rely on rules instead.
> > > > > > > > > >
> > > > > > > > > > In the Volcano model, a RelNode doesn’t really have a parent. It might
> > > > > > > > > be used in several places. (RelSet has a field ‘List<RelNode> parents’ that
> > > > > > > > > is kept up to date as planing progresses. But it’s really for Volcano’s
> > > > > > > > > internal use.)
> > > > > > > > > >
> > > > > > > > > > Even if you are not using Volcano, there are reasons to want the RelNode
> > > > > > > > > graph to be a dag, so again, a RelNode doesn’t have a unique parent.
> > > > > > > > > >
> > > > > > > > > > RelShuttleImpl has a stack. You can use that to find the parent. But the
> > > > > > > > > “parent” is just “where we came from as we traversed the RelNode graph”.
> > > > > > > > > There may be other “parents” that you do not know about.
> > > > > > > > > >
> > > > > > > > > > If you have a Project and want to find all parents that are Filters,
> > > > > > > > > don’t even think about “iterating over the parents” of the Project. Just
> > > > > > > > > write a rule that matches a Filter on a Project, and trust Volcano to do
> > > > > > > > > its job.
> > > > > > > > > >
> > > > > > > > > > Julian
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > On Apr 22, 2019, at 6:15 AM, Yuzhao Chen <yu...@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Thx, Stamatis, that somehow make sense, if i pass around the parent
> > > > > > > > > node every time I visit a RelNode and keep the parents in the cache, but it
> > > > > > > > > is still not that intuitive. Actually I what a to add a new RelTrait which
> > > > > > > > > bind to a specific scope, for example:
> > > > > > > > > > >
> > > > > > > > > > > join-rel(trait1)
> > > > > > > > > > > / \
> > > > > > > > > > > join2 join3
> > > > > > > > > > >
> > > > > > > > > > > Join-rel has a trait trait1, and I want all the children of join-rel
> > > > > > > > > can see this trait, with Calcite’s default metadata handler, I can only see
> > > > > > > > > the trait from children nodes(traits propagate from the inputs), and I have
> > > > > > > > > no idea how to propagate a trait reversely?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Danny Chan
> > > > > > > > > > > 在 2019年4月22日 +0800 PM8:44，Stamatis Zampetakis <za...@gmail.com>，写道：
> > > > > > > > > > > > Hi Danny,
> > > > > > > > > > > >
> > > > > > > > > > > > Apart from RelShuttle there is also RelVisitor which has a visit
> > > > > > > > > method
> > > > > > > > > > > > that provides the parent [1]. Not sure, if it suits your needs.
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Stamatis
> > > > > > > > > > > >
> > > > > > > > > > > > [1]
> > > > > > > > > > > >
> > > > > > > > > https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelVisitor.java#L43
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Apr 22, 2019 at 2:14 PM Yuzhao Chen <yu...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Now for RelNode, we have method getInput()[1] to fetch the input
> > > > > > > > > > > > > RelNodes, but how we fetch the parent ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > For example, we have plan:
> > > > > > > > > > > > >
> > > > > > > > > > > > > join-rel
> > > > > > > > > > > > > / \
> > > > > > > > > > > > > scan1 scan2
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > We can get scan1 and scan2 in join-rel directly with method
> > > > > > > > > getInput, but
> > > > > > > > > > > > > how can we get the join rel in scan1 and scan 2 ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > I know that there is a RelShuttle that can visit every RelNode and
> > > > > > > > > if I
> > > > > > > > > > > > > make a cache for the inputs mapping, finally I can get the
> > > > > > > > > ‘parents’ from
> > > > > > > > > > > > > the cache, but this is boring code and not that intuitive.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Do you guys have any good ideas ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1]
> > > > > > > > > > > > >
> > > > > > > > > https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelNode.java#L132
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Danny Chan
> > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > >
> > >
>

Re: How to traverse RelNode’s parent conviniently?

Posted by Julian Hyde <jh...@apache.org>.

The RelMetadata system is designed for these kinds of annotations - if there is a “global hints cache” there’s no benefit to doing it outside the RelMetadata system.

That said, I don’t know (and I don’t think anyone knows) how we want hints to be propagated as we generate RelNodes from RelNodes. I think we should focus on really simple cases first (e.g. hints about the whole query, or about particular table scans), and not try to automatically propagate them.

We can make the hints propagation mechanism more sophisticated when we have an actual use case to drive us.

Julian
 

> On Apr 26, 2019, at 3:41 PM, Yuzhao Chen <yu...@gmail.com> wrote:
> 
> Thx, Julian
> 
> Let me repeat my thoughts about the details again, in order to implement the hints, maybe these things are needed:
> 
> The main diff is that we will maintain a global hints cache
> 1. Supports hints grammar for parser.jj
> 2. During/after sql-to-rel, we may pass a hints cache to the SqlToRelConverter, there is a visitor to setup/init the RelNodes’hints to the cache once at a time, this cache scope is global and would be active the whole query planning time. The cache only keep hints for few top nodes that really needs
> 3. In the Planner, add set/get hints cache method, so that in the planning rules, we can see the hints cache,
> And we can also ban some rule matching in the planner
> 4. Hook the RelOptCall#transformTo method to handle logic of hints propagating(invoke the hints logic again same as sql-to-rel phrase), this will also update the global hints cache
> It seems that given the global hints cache, we do not need the MetaDataHandler any more, this is the thing I most want to make sure.
> Hope for your suggestions.
> 
> Best,
> Danny Chan
> 在 2019年4月25日 +0800 AM3:07，Julian Hyde <jh...@apache.org>，写道：
>> I think it’s OK to attach hints to the (few) RelNodes that come out of the SqlToRelConverter.
>> 
>> But it would be a mistake to try to propagate those hints to all of the RelNodes that are created during query planning. Even if we changed all of the copy methods (a huge task) there are many other ways that RelNodes get created. We would end up with a RelNode graph with lots of hints, and most of those hints would be inaccurate or not applicable.
>> 
>> For a particular hint, say "/*+ nohashjoin */“, some piece of code would need to look at the initial RelNode tree and take its own action: say, build a data structure to be used by planner rules, or enable or disable planner rules.
>> 
>> 
>>> On Apr 23, 2019, at 9:31 PM, Chunwei Lei <ch...@gmail.com> wrote:
>>> 
>>> Thanks Danny.
>>> 
>>> Those are good points. I think it depends on what we consider hint as.
>>> IMHO, if we consider hint as a kind of metadata,
>>> it is not a good idea to store the hints in the RelNode instance.
>>> 
>>> 
>>> 
>>> Best,
>>> Chunwei
>>> 
>>> On Wed, Apr 24, 2019 at 11:09 AM Yuzhao Chen <yu...@gmail.com> wrote:
>>>> 
>>>> Thx, Julian
>>>> 
>>>> I think the hint path is a good way for searching RelNode’s parents, broadly, there may be these modules/things need to be modified:
>>>> 
>>>> 1. Supports hints grammar for parser.jj
>>>> 2. Cache the hints in the RelNode instance, and add method like RelNode#getHints() to fetch all the hints inherited for this node.
>>>> 3. Modify #copy method for every kind of RelNode so that the hints can be copied when creating new equivalent nodes.
>>>> 4. Add a visitor in after sql-to-rel phrase, to set up full hints list for every children RelNode if there exists any.
>>>> 5. Add hints metadata handler and handles the hints fetching and overriding for specific kind of RelNode
>>>> 
>>>> The 2 and 3 are the modifications that i really want to confirm, that is, shall we store the hints in the RelNode instance ?
>>>> 
>>>> These are initial thoughts and if we make agreement, I would output a detail design doc which contains:
>>>> 
>>>> 1. The hints grammar supported for the major sql engines
>>>> 2. The hints grammar supported for Apache Calcite
>>>> 3. The interface and design ideas of the proposed modifications
>>>> 
>>>> 
>>>> Best,
>>>> Danny Chan
>>>> 在 2019年4月24日 +0800 AM3:04，Julian Hyde <jh...@apache.org>，写道：
>>>>> I see that if you have a hint on, say, the root node then it would be nice for its child or grand-child to be able to see that hint.
>>>>> 
>>>>> How about giving each hint an inherit path? Thus given
>>>>> 
>>>>> Filter Hint1
>>>>> +- Join
>>>>> +- Scan
>>>>> +- Project Hint2
>>>>> +- Scan
>>>>> 
>>>>> 
>>>>> Filter would have hints {Hint1[]}
>>>>> Join would have hints {Hint1[0]}
>>>>> Scan would have hints {Hint1[0, 0]}
>>>>> Project would have hints {Hint1[0,1], Hint2}
>>>>> Scan2 would have hints {[Hint1[0, 0, 1, 0], Hint2[0]}
>>>>> 
>>>>> You could populate the hints and inherit paths with a single visitor pass after sql-to-rel conversion.
>>>>> 
>>>>> By the way, I still like the idea of having kinds as a kind of RelMetadata, but I realize that a given RelNode might have more than one hint. So I think that the getHints(RelNode) method would return a List<Hint>, with Hint as follows:
>>>>> 
>>>>> class Hint {
>>>>> public final List<Integer> inheritPath; // immutable, not null
>>>>> public final String type; // not null
>>>>> public final Object operand; // immutable, may be null, must be JSON data
>>>>> }
>>>>> 
>>>>> operand must be JSON-style data (null, boolean, number, String, immutable List of JSON data, or immutable order-preserving Map from String to JSON data).
>>>>> 
>>>>>> On Apr 23, 2019, at 1:25 AM, Yuzhao Chen <yu...@gmail.com> wrote:
>>>>>> 
>>>>>> Thx, Andrew
>>>>>> 
>>>>>> I don’t want to have a custom RelNode class, I hope all the work about hints would be contributed to the community. I want to find an acceptable way to keep and propagate the hints if we use the MetadataHandler to cache and query the hints.
>>>>>> 
>>>>>> I don’t think the hints should be mixed into the cost model, that would make the cost computation very complex and hard to maintain, we only need the hints in our planning phrase to give suggestions, hints is more like another guideline for me and transparent to the planner.
>>>>>> 
>>>>>> Best,
>>>>>> Danny Chan
>>>>>> 在 2019年4月23日 +0800 PM2:24，Андрей Цвелодуб <a....@gmail.com>，写道：
>>>>>>> Hi Danny,
>>>>>>> 
>>>>>>> I would also agree with Julian on his position. I've tried to get around
>>>>>>> this limitation in several different ways, but none of it ended well :)
>>>>>>> 
>>>>>>> For your idea with hints, if you have custom RelNode classes, you can add
>>>>>>> hint as an additional field of the class and you can write a simple rule
>>>>>>> that propagates the hint downwards, step by step. And also include the hint
>>>>>>> in your cost estimation, so that nodes with hints would be more attractive
>>>>>>> to the planner. I'm not sure this would be the most correct way to use the
>>>>>>> cost mechanism, but at least it is straightforward and it works.
>>>>>>> 
>>>>>>> Best Regards,
>>>>>>> Andrew Tsvelodub
>>>>>>> 
>>>>>>> On Tue, 23 Apr 2019 at 08:44, Yuzhao Chen <yu...@gmail.com> wrote:
>>>>>>> 
>>>>>>>> Julian,
>>>>>>>> 
>>>>>>>> I want to add hint support for Calcite, the initial idea was to tag a
>>>>>>>> RelNode(transformed from a SqlNode with hint) with a hit attribute(or
>>>>>>>> trait), then I hope that the children (inputs) of it can see this hint, so
>>>>>>>> to make some decisions if it should consume or propagate the hint.
>>>>>>>> 
>>>>>>>> The problem I got here is the trait propagate from inputs from, which is
>>>>>>>> the opposite as what I need, can you give some suggestions ? If I use
>>>>>>>> MetadataHandler to cache and propagate the hints, how to propagate from
>>>>>>>> parents to children ?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Danny Chan
>>>>>>>> 在 2019年4月23日 +0800 AM3:14，Julian Hyde <jh...@apache.org>，写道：
>>>>>>>>> TL;DR: RelNodes don’t really have parents. Be careful if you are relying
>>>>>>>> on the parent concept too much. Rely on rules instead.
>>>>>>>>> 
>>>>>>>>> In the Volcano model, a RelNode doesn’t really have a parent. It might
>>>>>>>> be used in several places. (RelSet has a field ‘List<RelNode> parents’ that
>>>>>>>> is kept up to date as planing progresses. But it’s really for Volcano’s
>>>>>>>> internal use.)
>>>>>>>>> 
>>>>>>>>> Even if you are not using Volcano, there are reasons to want the RelNode
>>>>>>>> graph to be a dag, so again, a RelNode doesn’t have a unique parent.
>>>>>>>>> 
>>>>>>>>> RelShuttleImpl has a stack. You can use that to find the parent. But the
>>>>>>>> “parent” is just “where we came from as we traversed the RelNode graph”.
>>>>>>>> There may be other “parents” that you do not know about.
>>>>>>>>> 
>>>>>>>>> If you have a Project and want to find all parents that are Filters,
>>>>>>>> don’t even think about “iterating over the parents” of the Project. Just
>>>>>>>> write a rule that matches a Filter on a Project, and trust Volcano to do
>>>>>>>> its job.
>>>>>>>>> 
>>>>>>>>> Julian
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Apr 22, 2019, at 6:15 AM, Yuzhao Chen <yu...@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> Thx, Stamatis, that somehow make sense, if i pass around the parent
>>>>>>>> node every time I visit a RelNode and keep the parents in the cache, but it
>>>>>>>> is still not that intuitive. Actually I what a to add a new RelTrait which
>>>>>>>> bind to a specific scope, for example:
>>>>>>>>>> 
>>>>>>>>>> join-rel(trait1)
>>>>>>>>>> / \
>>>>>>>>>> join2 join3
>>>>>>>>>> 
>>>>>>>>>> Join-rel has a trait trait1, and I want all the children of join-rel
>>>>>>>> can see this trait, with Calcite’s default metadata handler, I can only see
>>>>>>>> the trait from children nodes(traits propagate from the inputs), and I have
>>>>>>>> no idea how to propagate a trait reversely?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Danny Chan
>>>>>>>>>> 在 2019年4月22日 +0800 PM8:44，Stamatis Zampetakis <za...@gmail.com>，写道：
>>>>>>>>>>> Hi Danny,
>>>>>>>>>>> 
>>>>>>>>>>> Apart from RelShuttle there is also RelVisitor which has a visit
>>>>>>>> method
>>>>>>>>>>> that provides the parent [1]. Not sure, if it suits your needs.
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Stamatis
>>>>>>>>>>> 
>>>>>>>>>>> [1]
>>>>>>>>>>> 
>>>>>>>> https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelVisitor.java#L43
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Apr 22, 2019 at 2:14 PM Yuzhao Chen <yu...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Now for RelNode, we have method getInput()[1] to fetch the input
>>>>>>>>>>>> RelNodes, but how we fetch the parent ?
>>>>>>>>>>>> 
>>>>>>>>>>>> For example, we have plan:
>>>>>>>>>>>> 
>>>>>>>>>>>> join-rel
>>>>>>>>>>>> / \
>>>>>>>>>>>> scan1 scan2
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> We can get scan1 and scan2 in join-rel directly with method
>>>>>>>> getInput, but
>>>>>>>>>>>> how can we get the join rel in scan1 and scan 2 ?
>>>>>>>>>>>> 
>>>>>>>>>>>> I know that there is a RelShuttle that can visit every RelNode and
>>>>>>>> if I
>>>>>>>>>>>> make a cache for the inputs mapping, finally I can get the
>>>>>>>> ‘parents’ from
>>>>>>>>>>>> the cache, but this is boring code and not that intuitive.
>>>>>>>>>>>> 
>>>>>>>>>>>> Do you guys have any good ideas ?
>>>>>>>>>>>> 
>>>>>>>>>>>> [1]
>>>>>>>>>>>> 
>>>>>>>> https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelNode.java#L132
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Danny Chan
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>

Re: How to traverse RelNode’s parent conviniently?

Posted by Yuzhao Chen <yu...@gmail.com>.

Thx, Julian

Let me repeat my thoughts about the details again, in order to implement the hints, maybe these things are needed:

The main diff is that we will maintain a global hints cache
1. Supports hints grammar for parser.jj
2. During/after sql-to-rel, we may pass a hints cache to the SqlToRelConverter, there is a visitor to setup/init the RelNodes’hints to the cache once at a time, this cache scope is global and would be active the whole query planning time. The cache only keep hints for few top nodes that really needs
3. In the Planner, add set/get hints cache method, so that in the planning rules, we can see the hints cache,
And we can also ban some rule matching in the planner
4. Hook the RelOptCall#transformTo method to handle logic of hints propagating(invoke the hints logic again same as sql-to-rel phrase), this will also update the global hints cache
It seems that given the global hints cache, we do not need the MetaDataHandler any more, this is the thing I most want to make sure.
Hope for your suggestions.

Best,
Danny Chan
在 2019年4月25日 +0800 AM3:07，Julian Hyde <jh...@apache.org>，写道：
> I think it’s OK to attach hints to the (few) RelNodes that come out of the SqlToRelConverter.
>
> But it would be a mistake to try to propagate those hints to all of the RelNodes that are created during query planning. Even if we changed all of the copy methods (a huge task) there are many other ways that RelNodes get created. We would end up with a RelNode graph with lots of hints, and most of those hints would be inaccurate or not applicable.
>
> For a particular hint, say "/*+ nohashjoin */“, some piece of code would need to look at the initial RelNode tree and take its own action: say, build a data structure to be used by planner rules, or enable or disable planner rules.
>
>
> > On Apr 23, 2019, at 9:31 PM, Chunwei Lei <ch...@gmail.com> wrote:
> >
> > Thanks Danny.
> >
> > Those are good points. I think it depends on what we consider hint as.
> > IMHO, if we consider hint as a kind of metadata,
> > it is not a good idea to store the hints in the RelNode instance.
> >
> >
> >
> > Best,
> > Chunwei
> >
> > On Wed, Apr 24, 2019 at 11:09 AM Yuzhao Chen <yu...@gmail.com> wrote:
> > >
> > > Thx, Julian
> > >
> > > I think the hint path is a good way for searching RelNode’s parents, broadly, there may be these modules/things need to be modified:
> > >
> > > 1. Supports hints grammar for parser.jj
> > > 2. Cache the hints in the RelNode instance, and add method like RelNode#getHints() to fetch all the hints inherited for this node.
> > > 3. Modify #copy method for every kind of RelNode so that the hints can be copied when creating new equivalent nodes.
> > > 4. Add a visitor in after sql-to-rel phrase, to set up full hints list for every children RelNode if there exists any.
> > > 5. Add hints metadata handler and handles the hints fetching and overriding for specific kind of RelNode
> > >
> > > The 2 and 3 are the modifications that i really want to confirm, that is, shall we store the hints in the RelNode instance ?
> > >
> > > These are initial thoughts and if we make agreement, I would output a detail design doc which contains:
> > >
> > > 1. The hints grammar supported for the major sql engines
> > > 2. The hints grammar supported for Apache Calcite
> > > 3. The interface and design ideas of the proposed modifications
> > >
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年4月24日 +0800 AM3:04，Julian Hyde <jh...@apache.org>，写道：
> > > > I see that if you have a hint on, say, the root node then it would be nice for its child or grand-child to be able to see that hint.
> > > >
> > > > How about giving each hint an inherit path? Thus given
> > > >
> > > > Filter Hint1
> > > > +- Join
> > > > +- Scan
> > > > +- Project Hint2
> > > > +- Scan
> > > >
> > > >
> > > > Filter would have hints {Hint1[]}
> > > > Join would have hints {Hint1[0]}
> > > > Scan would have hints {Hint1[0, 0]}
> > > > Project would have hints {Hint1[0,1], Hint2}
> > > > Scan2 would have hints {[Hint1[0, 0, 1, 0], Hint2[0]}
> > > >
> > > > You could populate the hints and inherit paths with a single visitor pass after sql-to-rel conversion.
> > > >
> > > > By the way, I still like the idea of having kinds as a kind of RelMetadata, but I realize that a given RelNode might have more than one hint. So I think that the getHints(RelNode) method would return a List<Hint>, with Hint as follows:
> > > >
> > > > class Hint {
> > > > public final List<Integer> inheritPath; // immutable, not null
> > > > public final String type; // not null
> > > > public final Object operand; // immutable, may be null, must be JSON data
> > > > }
> > > >
> > > > operand must be JSON-style data (null, boolean, number, String, immutable List of JSON data, or immutable order-preserving Map from String to JSON data).
> > > >
> > > > > On Apr 23, 2019, at 1:25 AM, Yuzhao Chen <yu...@gmail.com> wrote:
> > > > >
> > > > > Thx, Andrew
> > > > >
> > > > > I don’t want to have a custom RelNode class, I hope all the work about hints would be contributed to the community. I want to find an acceptable way to keep and propagate the hints if we use the MetadataHandler to cache and query the hints.
> > > > >
> > > > > I don’t think the hints should be mixed into the cost model, that would make the cost computation very complex and hard to maintain, we only need the hints in our planning phrase to give suggestions, hints is more like another guideline for me and transparent to the planner.
> > > > >
> > > > > Best,
> > > > > Danny Chan
> > > > > 在 2019年4月23日 +0800 PM2:24，Андрей Цвелодуб <a....@gmail.com>，写道：
> > > > > > Hi Danny,
> > > > > >
> > > > > > I would also agree with Julian on his position. I've tried to get around
> > > > > > this limitation in several different ways, but none of it ended well :)
> > > > > >
> > > > > > For your idea with hints, if you have custom RelNode classes, you can add
> > > > > > hint as an additional field of the class and you can write a simple rule
> > > > > > that propagates the hint downwards, step by step. And also include the hint
> > > > > > in your cost estimation, so that nodes with hints would be more attractive
> > > > > > to the planner. I'm not sure this would be the most correct way to use the
> > > > > > cost mechanism, but at least it is straightforward and it works.
> > > > > >
> > > > > > Best Regards,
> > > > > > Andrew Tsvelodub
> > > > > >
> > > > > > On Tue, 23 Apr 2019 at 08:44, Yuzhao Chen <yu...@gmail.com> wrote:
> > > > > >
> > > > > > > Julian,
> > > > > > >
> > > > > > > I want to add hint support for Calcite, the initial idea was to tag a
> > > > > > > RelNode(transformed from a SqlNode with hint) with a hit attribute(or
> > > > > > > trait), then I hope that the children (inputs) of it can see this hint, so
> > > > > > > to make some decisions if it should consume or propagate the hint.
> > > > > > >
> > > > > > > The problem I got here is the trait propagate from inputs from, which is
> > > > > > > the opposite as what I need, can you give some suggestions ? If I use
> > > > > > > MetadataHandler to cache and propagate the hints, how to propagate from
> > > > > > > parents to children ?
> > > > > > >
> > > > > > > Best,
> > > > > > > Danny Chan
> > > > > > > 在 2019年4月23日 +0800 AM3:14，Julian Hyde <jh...@apache.org>，写道：
> > > > > > > > TL;DR: RelNodes don’t really have parents. Be careful if you are relying
> > > > > > > on the parent concept too much. Rely on rules instead.
> > > > > > > >
> > > > > > > > In the Volcano model, a RelNode doesn’t really have a parent. It might
> > > > > > > be used in several places. (RelSet has a field ‘List<RelNode> parents’ that
> > > > > > > is kept up to date as planing progresses. But it’s really for Volcano’s
> > > > > > > internal use.)
> > > > > > > >
> > > > > > > > Even if you are not using Volcano, there are reasons to want the RelNode
> > > > > > > graph to be a dag, so again, a RelNode doesn’t have a unique parent.
> > > > > > > >
> > > > > > > > RelShuttleImpl has a stack. You can use that to find the parent. But the
> > > > > > > “parent” is just “where we came from as we traversed the RelNode graph”.
> > > > > > > There may be other “parents” that you do not know about.
> > > > > > > >
> > > > > > > > If you have a Project and want to find all parents that are Filters,
> > > > > > > don’t even think about “iterating over the parents” of the Project. Just
> > > > > > > write a rule that matches a Filter on a Project, and trust Volcano to do
> > > > > > > its job.
> > > > > > > >
> > > > > > > > Julian
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > On Apr 22, 2019, at 6:15 AM, Yuzhao Chen <yu...@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Thx, Stamatis, that somehow make sense, if i pass around the parent
> > > > > > > node every time I visit a RelNode and keep the parents in the cache, but it
> > > > > > > is still not that intuitive. Actually I what a to add a new RelTrait which
> > > > > > > bind to a specific scope, for example:
> > > > > > > > >
> > > > > > > > > join-rel(trait1)
> > > > > > > > > / \
> > > > > > > > > join2 join3
> > > > > > > > >
> > > > > > > > > Join-rel has a trait trait1, and I want all the children of join-rel
> > > > > > > can see this trait, with Calcite’s default metadata handler, I can only see
> > > > > > > the trait from children nodes(traits propagate from the inputs), and I have
> > > > > > > no idea how to propagate a trait reversely?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Danny Chan
> > > > > > > > > 在 2019年4月22日 +0800 PM8:44，Stamatis Zampetakis <za...@gmail.com>，写道：
> > > > > > > > > > Hi Danny,
> > > > > > > > > >
> > > > > > > > > > Apart from RelShuttle there is also RelVisitor which has a visit
> > > > > > > method
> > > > > > > > > > that provides the parent [1]. Not sure, if it suits your needs.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Stamatis
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelVisitor.java#L43
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Apr 22, 2019 at 2:14 PM Yuzhao Chen <yu...@gmail.com>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Now for RelNode, we have method getInput()[1] to fetch the input
> > > > > > > > > > > RelNodes, but how we fetch the parent ?
> > > > > > > > > > >
> > > > > > > > > > > For example, we have plan:
> > > > > > > > > > >
> > > > > > > > > > > join-rel
> > > > > > > > > > > / \
> > > > > > > > > > > scan1 scan2
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > We can get scan1 and scan2 in join-rel directly with method
> > > > > > > getInput, but
> > > > > > > > > > > how can we get the join rel in scan1 and scan 2 ?
> > > > > > > > > > >
> > > > > > > > > > > I know that there is a RelShuttle that can visit every RelNode and
> > > > > > > if I
> > > > > > > > > > > make a cache for the inputs mapping, finally I can get the
> > > > > > > ‘parents’ from
> > > > > > > > > > > the cache, but this is boring code and not that intuitive.
> > > > > > > > > > >
> > > > > > > > > > > Do you guys have any good ideas ?
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > > > > >
> > > > > > > https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelNode.java#L132
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Danny Chan
> > > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > >
>

Re: How to traverse RelNode’s parent conviniently?

Posted by Julian Hyde <jh...@apache.org>.

I think it’s OK to attach hints to the (few) RelNodes that come out of the SqlToRelConverter.

But it would be a mistake to try to propagate those hints to all of the RelNodes that are created during query planning. Even if we changed all of the copy methods (a huge task) there are many other ways that RelNodes get created. We would end up with a RelNode graph with lots of hints, and most of those hints would be inaccurate or not applicable.

For a particular hint, say "/*+ nohashjoin */“, some piece of code would need to look at the initial RelNode tree and take its own action: say, build a data structure to be used by planner rules, or enable or disable planner rules.


> On Apr 23, 2019, at 9:31 PM, Chunwei Lei <ch...@gmail.com> wrote:
> 
> Thanks Danny.
> 
> Those are good points. I think it depends on what we consider hint as.
> IMHO, if we consider hint as a kind of metadata,
> it is not a good idea to store the hints in the RelNode instance.
> 
> 
> 
> Best,
> Chunwei
> 
> On Wed, Apr 24, 2019 at 11:09 AM Yuzhao Chen <yu...@gmail.com> wrote:
>> 
>> Thx, Julian
>> 
>> I think the hint path is a good way for searching RelNode’s parents, broadly, there may be these modules/things need to be modified:
>> 
>> 1. Supports hints grammar for parser.jj
>> 2. Cache the hints in the RelNode instance, and add method like RelNode#getHints() to fetch all the hints inherited for this node.
>> 3. Modify #copy method for every kind of RelNode so that the hints can be copied when creating new equivalent nodes.
>> 4. Add a visitor in after sql-to-rel phrase, to set up full hints list for every children RelNode if there exists any.
>> 5. Add hints metadata handler and handles the hints fetching and overriding for specific kind of RelNode
>> 
>> The 2 and 3 are the modifications that i really want to confirm, that is, shall we store the hints in the RelNode instance ?
>> 
>> These are initial thoughts and if we make agreement, I would output a detail design doc which contains:
>> 
>> 1. The hints grammar supported for the major sql engines
>> 2. The hints grammar supported for Apache Calcite
>> 3. The interface and design ideas of the proposed modifications
>> 
>> 
>> Best,
>> Danny Chan
>> 在 2019年4月24日 +0800 AM3:04，Julian Hyde <jh...@apache.org>，写道：
>>> I see that if you have a hint on, say, the root node then it would be nice for its child or grand-child to be able to see that hint.
>>> 
>>> How about giving each hint an inherit path? Thus given
>>> 
>>> Filter Hint1
>>> +- Join
>>> +- Scan
>>> +- Project Hint2
>>> +- Scan
>>> 
>>> 
>>> Filter would have hints {Hint1[]}
>>> Join would have hints {Hint1[0]}
>>> Scan would have hints {Hint1[0, 0]}
>>> Project would have hints {Hint1[0,1], Hint2}
>>> Scan2 would have hints {[Hint1[0, 0, 1, 0], Hint2[0]}
>>> 
>>> You could populate the hints and inherit paths with a single visitor pass after sql-to-rel conversion.
>>> 
>>> By the way, I still like the idea of having kinds as a kind of RelMetadata, but I realize that a given RelNode might have more than one hint. So I think that the getHints(RelNode) method would return a List<Hint>, with Hint as follows:
>>> 
>>> class Hint {
>>> public final List<Integer> inheritPath; // immutable, not null
>>> public final String type; // not null
>>> public final Object operand; // immutable, may be null, must be JSON data
>>> }
>>> 
>>> operand must be JSON-style data (null, boolean, number, String, immutable List of JSON data, or immutable order-preserving Map from String to JSON data).
>>> 
>>>> On Apr 23, 2019, at 1:25 AM, Yuzhao Chen <yu...@gmail.com> wrote:
>>>> 
>>>> Thx, Andrew
>>>> 
>>>> I don’t want to have a custom RelNode class, I hope all the work about hints would be contributed to the community. I want to find an acceptable way to keep and propagate the hints if we use the MetadataHandler to cache and query the hints.
>>>> 
>>>> I don’t think the hints should be mixed into the cost model, that would make the cost computation very complex and hard to maintain, we only need the hints in our planning phrase to give suggestions, hints is more like another guideline for me and transparent to the planner.
>>>> 
>>>> Best,
>>>> Danny Chan
>>>> 在 2019年4月23日 +0800 PM2:24，Андрей Цвелодуб <a....@gmail.com>，写道：
>>>>> Hi Danny,
>>>>> 
>>>>> I would also agree with Julian on his position. I've tried to get around
>>>>> this limitation in several different ways, but none of it ended well :)
>>>>> 
>>>>> For your idea with hints, if you have custom RelNode classes, you can add
>>>>> hint as an additional field of the class and you can write a simple rule
>>>>> that propagates the hint downwards, step by step. And also include the hint
>>>>> in your cost estimation, so that nodes with hints would be more attractive
>>>>> to the planner. I'm not sure this would be the most correct way to use the
>>>>> cost mechanism, but at least it is straightforward and it works.
>>>>> 
>>>>> Best Regards,
>>>>> Andrew Tsvelodub
>>>>> 
>>>>> On Tue, 23 Apr 2019 at 08:44, Yuzhao Chen <yu...@gmail.com> wrote:
>>>>> 
>>>>>> Julian,
>>>>>> 
>>>>>> I want to add hint support for Calcite, the initial idea was to tag a
>>>>>> RelNode(transformed from a SqlNode with hint) with a hit attribute(or
>>>>>> trait), then I hope that the children (inputs) of it can see this hint, so
>>>>>> to make some decisions if it should consume or propagate the hint.
>>>>>> 
>>>>>> The problem I got here is the trait propagate from inputs from, which is
>>>>>> the opposite as what I need, can you give some suggestions ? If I use
>>>>>> MetadataHandler to cache and propagate the hints, how to propagate from
>>>>>> parents to children ?
>>>>>> 
>>>>>> Best,
>>>>>> Danny Chan
>>>>>> 在 2019年4月23日 +0800 AM3:14，Julian Hyde <jh...@apache.org>，写道：
>>>>>>> TL;DR: RelNodes don’t really have parents. Be careful if you are relying
>>>>>> on the parent concept too much. Rely on rules instead.
>>>>>>> 
>>>>>>> In the Volcano model, a RelNode doesn’t really have a parent. It might
>>>>>> be used in several places. (RelSet has a field ‘List<RelNode> parents’ that
>>>>>> is kept up to date as planing progresses. But it’s really for Volcano’s
>>>>>> internal use.)
>>>>>>> 
>>>>>>> Even if you are not using Volcano, there are reasons to want the RelNode
>>>>>> graph to be a dag, so again, a RelNode doesn’t have a unique parent.
>>>>>>> 
>>>>>>> RelShuttleImpl has a stack. You can use that to find the parent. But the
>>>>>> “parent” is just “where we came from as we traversed the RelNode graph”.
>>>>>> There may be other “parents” that you do not know about.
>>>>>>> 
>>>>>>> If you have a Project and want to find all parents that are Filters,
>>>>>> don’t even think about “iterating over the parents” of the Project. Just
>>>>>> write a rule that matches a Filter on a Project, and trust Volcano to do
>>>>>> its job.
>>>>>>> 
>>>>>>> Julian
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Apr 22, 2019, at 6:15 AM, Yuzhao Chen <yu...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Thx, Stamatis, that somehow make sense, if i pass around the parent
>>>>>> node every time I visit a RelNode and keep the parents in the cache, but it
>>>>>> is still not that intuitive. Actually I what a to add a new RelTrait which
>>>>>> bind to a specific scope, for example:
>>>>>>>> 
>>>>>>>> join-rel(trait1)
>>>>>>>> / \
>>>>>>>> join2 join3
>>>>>>>> 
>>>>>>>> Join-rel has a trait trait1, and I want all the children of join-rel
>>>>>> can see this trait, with Calcite’s default metadata handler, I can only see
>>>>>> the trait from children nodes(traits propagate from the inputs), and I have
>>>>>> no idea how to propagate a trait reversely?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Danny Chan
>>>>>>>> 在 2019年4月22日 +0800 PM8:44，Stamatis Zampetakis <za...@gmail.com>，写道：
>>>>>>>>> Hi Danny,
>>>>>>>>> 
>>>>>>>>> Apart from RelShuttle there is also RelVisitor which has a visit
>>>>>> method
>>>>>>>>> that provides the parent [1]. Not sure, if it suits your needs.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Stamatis
>>>>>>>>> 
>>>>>>>>> [1]
>>>>>>>>> 
>>>>>> https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelVisitor.java#L43
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Mon, Apr 22, 2019 at 2:14 PM Yuzhao Chen <yu...@gmail.com>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Now for RelNode, we have method getInput()[1] to fetch the input
>>>>>>>>>> RelNodes, but how we fetch the parent ?
>>>>>>>>>> 
>>>>>>>>>> For example, we have plan:
>>>>>>>>>> 
>>>>>>>>>> join-rel
>>>>>>>>>> / \
>>>>>>>>>> scan1 scan2
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> We can get scan1 and scan2 in join-rel directly with method
>>>>>> getInput, but
>>>>>>>>>> how can we get the join rel in scan1 and scan 2 ?
>>>>>>>>>> 
>>>>>>>>>> I know that there is a RelShuttle that can visit every RelNode and
>>>>>> if I
>>>>>>>>>> make a cache for the inputs mapping, finally I can get the
>>>>>> ‘parents’ from
>>>>>>>>>> the cache, but this is boring code and not that intuitive.
>>>>>>>>>> 
>>>>>>>>>> Do you guys have any good ideas ?
>>>>>>>>>> 
>>>>>>>>>> [1]
>>>>>>>>>> 
>>>>>> https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelNode.java#L132
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Danny Chan
>>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>

Re: How to traverse RelNode’s parent conviniently?

Posted by Chunwei Lei <ch...@gmail.com>.

Thanks Danny.

Those are good points. I think it depends on what we consider hint as.
IMHO, if we consider hint as a kind of metadata,
it is not a good idea to store the hints in the RelNode instance.



Best,
Chunwei

On Wed, Apr 24, 2019 at 11:09 AM Yuzhao Chen <yu...@gmail.com> wrote:
>
> Thx, Julian
>
> I think the hint path is a good way for searching RelNode’s parents, broadly, there may be these modules/things need to be modified:
>
> 1. Supports hints grammar for parser.jj
> 2. Cache the hints in the RelNode instance, and add method like RelNode#getHints() to fetch all the hints inherited for this node.
> 3. Modify #copy method for every kind of RelNode so that the hints can be copied when creating new equivalent nodes.
> 4. Add a visitor in after sql-to-rel phrase, to set up full hints list for every children RelNode if there exists any.
> 5. Add hints metadata handler and handles the hints fetching and overriding for specific kind of RelNode
>
> The 2 and 3 are the modifications that i really want to confirm, that is, shall we store the hints in the RelNode instance ?
>
> These are initial thoughts and if we make agreement, I would output a detail design doc which contains:
>
> 1. The hints grammar supported for the major sql engines
> 2. The hints grammar supported for Apache Calcite
> 3. The interface and design ideas of the proposed modifications
>
>
> Best,
> Danny Chan
> 在 2019年4月24日 +0800 AM3:04，Julian Hyde <jh...@apache.org>，写道：
> > I see that if you have a hint on, say, the root node then it would be nice for its child or grand-child to be able to see that hint.
> >
> > How about giving each hint an inherit path? Thus given
> >
> > Filter Hint1
> > +- Join
> > +- Scan
> > +- Project Hint2
> > +- Scan
> >
> >
> > Filter would have hints {Hint1[]}
> > Join would have hints {Hint1[0]}
> > Scan would have hints {Hint1[0, 0]}
> > Project would have hints {Hint1[0,1], Hint2}
> > Scan2 would have hints {[Hint1[0, 0, 1, 0], Hint2[0]}
> >
> > You could populate the hints and inherit paths with a single visitor pass after sql-to-rel conversion.
> >
> > By the way, I still like the idea of having kinds as a kind of RelMetadata, but I realize that a given RelNode might have more than one hint. So I think that the getHints(RelNode) method would return a List<Hint>, with Hint as follows:
> >
> > class Hint {
> > public final List<Integer> inheritPath; // immutable, not null
> > public final String type; // not null
> > public final Object operand; // immutable, may be null, must be JSON data
> > }
> >
> > operand must be JSON-style data (null, boolean, number, String, immutable List of JSON data, or immutable order-preserving Map from String to JSON data).
> >
> > > On Apr 23, 2019, at 1:25 AM, Yuzhao Chen <yu...@gmail.com> wrote:
> > >
> > > Thx, Andrew
> > >
> > > I don’t want to have a custom RelNode class, I hope all the work about hints would be contributed to the community. I want to find an acceptable way to keep and propagate the hints if we use the MetadataHandler to cache and query the hints.
> > >
> > > I don’t think the hints should be mixed into the cost model, that would make the cost computation very complex and hard to maintain, we only need the hints in our planning phrase to give suggestions, hints is more like another guideline for me and transparent to the planner.
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年4月23日 +0800 PM2:24，Андрей Цвелодуб <a....@gmail.com>，写道：
> > > > Hi Danny,
> > > >
> > > > I would also agree with Julian on his position. I've tried to get around
> > > > this limitation in several different ways, but none of it ended well :)
> > > >
> > > > For your idea with hints, if you have custom RelNode classes, you can add
> > > > hint as an additional field of the class and you can write a simple rule
> > > > that propagates the hint downwards, step by step. And also include the hint
> > > > in your cost estimation, so that nodes with hints would be more attractive
> > > > to the planner. I'm not sure this would be the most correct way to use the
> > > > cost mechanism, but at least it is straightforward and it works.
> > > >
> > > > Best Regards,
> > > > Andrew Tsvelodub
> > > >
> > > > On Tue, 23 Apr 2019 at 08:44, Yuzhao Chen <yu...@gmail.com> wrote:
> > > >
> > > > > Julian,
> > > > >
> > > > > I want to add hint support for Calcite, the initial idea was to tag a
> > > > > RelNode(transformed from a SqlNode with hint) with a hit attribute(or
> > > > > trait), then I hope that the children (inputs) of it can see this hint, so
> > > > > to make some decisions if it should consume or propagate the hint.
> > > > >
> > > > > The problem I got here is the trait propagate from inputs from, which is
> > > > > the opposite as what I need, can you give some suggestions ? If I use
> > > > > MetadataHandler to cache and propagate the hints, how to propagate from
> > > > > parents to children ?
> > > > >
> > > > > Best,
> > > > > Danny Chan
> > > > > 在 2019年4月23日 +0800 AM3:14，Julian Hyde <jh...@apache.org>，写道：
> > > > > > TL;DR: RelNodes don’t really have parents. Be careful if you are relying
> > > > > on the parent concept too much. Rely on rules instead.
> > > > > >
> > > > > > In the Volcano model, a RelNode doesn’t really have a parent. It might
> > > > > be used in several places. (RelSet has a field ‘List<RelNode> parents’ that
> > > > > is kept up to date as planing progresses. But it’s really for Volcano’s
> > > > > internal use.)
> > > > > >
> > > > > > Even if you are not using Volcano, there are reasons to want the RelNode
> > > > > graph to be a dag, so again, a RelNode doesn’t have a unique parent.
> > > > > >
> > > > > > RelShuttleImpl has a stack. You can use that to find the parent. But the
> > > > > “parent” is just “where we came from as we traversed the RelNode graph”.
> > > > > There may be other “parents” that you do not know about.
> > > > > >
> > > > > > If you have a Project and want to find all parents that are Filters,
> > > > > don’t even think about “iterating over the parents” of the Project. Just
> > > > > write a rule that matches a Filter on a Project, and trust Volcano to do
> > > > > its job.
> > > > > >
> > > > > > Julian
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > On Apr 22, 2019, at 6:15 AM, Yuzhao Chen <yu...@gmail.com> wrote:
> > > > > > >
> > > > > > > Thx, Stamatis, that somehow make sense, if i pass around the parent
> > > > > node every time I visit a RelNode and keep the parents in the cache, but it
> > > > > is still not that intuitive. Actually I what a to add a new RelTrait which
> > > > > bind to a specific scope, for example:
> > > > > > >
> > > > > > > join-rel(trait1)
> > > > > > > / \
> > > > > > > join2 join3
> > > > > > >
> > > > > > > Join-rel has a trait trait1, and I want all the children of join-rel
> > > > > can see this trait, with Calcite’s default metadata handler, I can only see
> > > > > the trait from children nodes(traits propagate from the inputs), and I have
> > > > > no idea how to propagate a trait reversely?
> > > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > > Danny Chan
> > > > > > > 在 2019年4月22日 +0800 PM8:44，Stamatis Zampetakis <za...@gmail.com>，写道：
> > > > > > > > Hi Danny,
> > > > > > > >
> > > > > > > > Apart from RelShuttle there is also RelVisitor which has a visit
> > > > > method
> > > > > > > > that provides the parent [1]. Not sure, if it suits your needs.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Stamatis
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelVisitor.java#L43
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Apr 22, 2019 at 2:14 PM Yuzhao Chen <yu...@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Now for RelNode, we have method getInput()[1] to fetch the input
> > > > > > > > > RelNodes, but how we fetch the parent ?
> > > > > > > > >
> > > > > > > > > For example, we have plan:
> > > > > > > > >
> > > > > > > > > join-rel
> > > > > > > > > / \
> > > > > > > > > scan1 scan2
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > We can get scan1 and scan2 in join-rel directly with method
> > > > > getInput, but
> > > > > > > > > how can we get the join rel in scan1 and scan 2 ?
> > > > > > > > >
> > > > > > > > > I know that there is a RelShuttle that can visit every RelNode and
> > > > > if I
> > > > > > > > > make a cache for the inputs mapping, finally I can get the
> > > > > ‘parents’ from
> > > > > > > > > the cache, but this is boring code and not that intuitive.
> > > > > > > > >
> > > > > > > > > Do you guys have any good ideas ?
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelNode.java#L132
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Danny Chan
> > > > > > > > >
> > > > > >
> > > > >
> >

Re: How to traverse RelNode’s parent conviniently?

Posted by Yuzhao Chen <yu...@gmail.com>.

Thx, Julian

I think the hint path is a good way for searching RelNode’s parents, broadly, there may be these modules/things need to be modified:

1. Supports hints grammar for parser.jj
2. Cache the hints in the RelNode instance, and add method like RelNode#getHints() to fetch all the hints inherited for this node.
3. Modify #copy method for every kind of RelNode so that the hints can be copied when creating new equivalent nodes.
4. Add a visitor in after sql-to-rel phrase, to set up full hints list for every children RelNode if there exists any.
5. Add hints metadata handler and handles the hints fetching and overriding for specific kind of RelNode

The 2 and 3 are the modifications that i really want to confirm, that is, shall we store the hints in the RelNode instance ?

These are initial thoughts and if we make agreement, I would output a detail design doc which contains:

1. The hints grammar supported for the major sql engines
2. The hints grammar supported for Apache Calcite
3. The interface and design ideas of the proposed modifications


Best,
Danny Chan
在 2019年4月24日 +0800 AM3:04，Julian Hyde <jh...@apache.org>，写道：
> I see that if you have a hint on, say, the root node then it would be nice for its child or grand-child to be able to see that hint.
>
> How about giving each hint an inherit path? Thus given
>
> Filter Hint1
> +- Join
> +- Scan
> +- Project Hint2
> +- Scan
>
>
> Filter would have hints {Hint1[]}
> Join would have hints {Hint1[0]}
> Scan would have hints {Hint1[0, 0]}
> Project would have hints {Hint1[0,1], Hint2}
> Scan2 would have hints {[Hint1[0, 0, 1, 0], Hint2[0]}
>
> You could populate the hints and inherit paths with a single visitor pass after sql-to-rel conversion.
>
> By the way, I still like the idea of having kinds as a kind of RelMetadata, but I realize that a given RelNode might have more than one hint. So I think that the getHints(RelNode) method would return a List<Hint>, with Hint as follows:
>
> class Hint {
> public final List<Integer> inheritPath; // immutable, not null
> public final String type; // not null
> public final Object operand; // immutable, may be null, must be JSON data
> }
>
> operand must be JSON-style data (null, boolean, number, String, immutable List of JSON data, or immutable order-preserving Map from String to JSON data).
>
> > On Apr 23, 2019, at 1:25 AM, Yuzhao Chen <yu...@gmail.com> wrote:
> >
> > Thx, Andrew
> >
> > I don’t want to have a custom RelNode class, I hope all the work about hints would be contributed to the community. I want to find an acceptable way to keep and propagate the hints if we use the MetadataHandler to cache and query the hints.
> >
> > I don’t think the hints should be mixed into the cost model, that would make the cost computation very complex and hard to maintain, we only need the hints in our planning phrase to give suggestions, hints is more like another guideline for me and transparent to the planner.
> >
> > Best,
> > Danny Chan
> > 在 2019年4月23日 +0800 PM2:24，Андрей Цвелодуб <a....@gmail.com>，写道：
> > > Hi Danny,
> > >
> > > I would also agree with Julian on his position. I've tried to get around
> > > this limitation in several different ways, but none of it ended well :)
> > >
> > > For your idea with hints, if you have custom RelNode classes, you can add
> > > hint as an additional field of the class and you can write a simple rule
> > > that propagates the hint downwards, step by step. And also include the hint
> > > in your cost estimation, so that nodes with hints would be more attractive
> > > to the planner. I'm not sure this would be the most correct way to use the
> > > cost mechanism, but at least it is straightforward and it works.
> > >
> > > Best Regards,
> > > Andrew Tsvelodub
> > >
> > > On Tue, 23 Apr 2019 at 08:44, Yuzhao Chen <yu...@gmail.com> wrote:
> > >
> > > > Julian,
> > > >
> > > > I want to add hint support for Calcite, the initial idea was to tag a
> > > > RelNode(transformed from a SqlNode with hint) with a hit attribute(or
> > > > trait), then I hope that the children (inputs) of it can see this hint, so
> > > > to make some decisions if it should consume or propagate the hint.
> > > >
> > > > The problem I got here is the trait propagate from inputs from, which is
> > > > the opposite as what I need, can you give some suggestions ? If I use
> > > > MetadataHandler to cache and propagate the hints, how to propagate from
> > > > parents to children ?
> > > >
> > > > Best,
> > > > Danny Chan
> > > > 在 2019年4月23日 +0800 AM3:14，Julian Hyde <jh...@apache.org>，写道：
> > > > > TL;DR: RelNodes don’t really have parents. Be careful if you are relying
> > > > on the parent concept too much. Rely on rules instead.
> > > > >
> > > > > In the Volcano model, a RelNode doesn’t really have a parent. It might
> > > > be used in several places. (RelSet has a field ‘List<RelNode> parents’ that
> > > > is kept up to date as planing progresses. But it’s really for Volcano’s
> > > > internal use.)
> > > > >
> > > > > Even if you are not using Volcano, there are reasons to want the RelNode
> > > > graph to be a dag, so again, a RelNode doesn’t have a unique parent.
> > > > >
> > > > > RelShuttleImpl has a stack. You can use that to find the parent. But the
> > > > “parent” is just “where we came from as we traversed the RelNode graph”.
> > > > There may be other “parents” that you do not know about.
> > > > >
> > > > > If you have a Project and want to find all parents that are Filters,
> > > > don’t even think about “iterating over the parents” of the Project. Just
> > > > write a rule that matches a Filter on a Project, and trust Volcano to do
> > > > its job.
> > > > >
> > > > > Julian
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > On Apr 22, 2019, at 6:15 AM, Yuzhao Chen <yu...@gmail.com> wrote:
> > > > > >
> > > > > > Thx, Stamatis, that somehow make sense, if i pass around the parent
> > > > node every time I visit a RelNode and keep the parents in the cache, but it
> > > > is still not that intuitive. Actually I what a to add a new RelTrait which
> > > > bind to a specific scope, for example:
> > > > > >
> > > > > > join-rel(trait1)
> > > > > > / \
> > > > > > join2 join3
> > > > > >
> > > > > > Join-rel has a trait trait1, and I want all the children of join-rel
> > > > can see this trait, with Calcite’s default metadata handler, I can only see
> > > > the trait from children nodes(traits propagate from the inputs), and I have
> > > > no idea how to propagate a trait reversely?
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Danny Chan
> > > > > > 在 2019年4月22日 +0800 PM8:44，Stamatis Zampetakis <za...@gmail.com>，写道：
> > > > > > > Hi Danny,
> > > > > > >
> > > > > > > Apart from RelShuttle there is also RelVisitor which has a visit
> > > > method
> > > > > > > that provides the parent [1]. Not sure, if it suits your needs.
> > > > > > >
> > > > > > > Best,
> > > > > > > Stamatis
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelVisitor.java#L43
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Apr 22, 2019 at 2:14 PM Yuzhao Chen <yu...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > > Now for RelNode, we have method getInput()[1] to fetch the input
> > > > > > > > RelNodes, but how we fetch the parent ?
> > > > > > > >
> > > > > > > > For example, we have plan:
> > > > > > > >
> > > > > > > > join-rel
> > > > > > > > / \
> > > > > > > > scan1 scan2
> > > > > > > >
> > > > > > > >
> > > > > > > > We can get scan1 and scan2 in join-rel directly with method
> > > > getInput, but
> > > > > > > > how can we get the join rel in scan1 and scan 2 ?
> > > > > > > >
> > > > > > > > I know that there is a RelShuttle that can visit every RelNode and
> > > > if I
> > > > > > > > make a cache for the inputs mapping, finally I can get the
> > > > ‘parents’ from
> > > > > > > > the cache, but this is boring code and not that intuitive.
> > > > > > > >
> > > > > > > > Do you guys have any good ideas ?
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelNode.java#L132
> > > > > > > >
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Danny Chan
> > > > > > > >
> > > > >
> > > >
>

Re: How to traverse RelNode’s parent conviniently?

Posted by Julian Hyde <jh...@apache.org>.

I see that if you have a hint on, say, the root node then it would be nice for its child or grand-child to be able to see that hint.

How about giving each hint an inherit path? Thus given

 Filter Hint1
  +- Join
      +- Scan
      +- Project Hint2
         +- Scan


Filter would have hints {Hint1[]}
Join would have hints {Hint1[0]}
Scan would have hints {Hint1[0, 0]}
Project would have hints {Hint1[0,1], Hint2}
Scan2 would have hints {[Hint1[0, 0, 1, 0], Hint2[0]}

You could populate the hints and inherit paths with a single visitor pass after sql-to-rel conversion.

By the way, I still like the idea of having kinds as a kind of RelMetadata, but I realize that a given RelNode might have more than one hint. So I think that the getHints(RelNode) method would return a List<Hint>, with Hint as follows:

  class Hint {
    public final List<Integer> inheritPath; // immutable, not null
    public final String type; // not null
    public final Object operand; // immutable, may be null, must be JSON data
  }

operand must be JSON-style data (null, boolean, number, String, immutable List of JSON data, or immutable order-preserving Map from String to JSON data).

> On Apr 23, 2019, at 1:25 AM, Yuzhao Chen <yu...@gmail.com> wrote:
> 
> Thx, Andrew
> 
> I don’t want to have a custom RelNode class, I hope all the work about hints would be contributed to the community. I want to find an acceptable way to keep and propagate the hints if we use the MetadataHandler to cache and query the hints.
> 
> I don’t think the hints should be mixed into the cost model, that would make the cost computation very complex and hard to maintain, we only need the hints in our planning phrase to give suggestions, hints is more like another guideline for me and transparent to the planner.
> 
> Best,
> Danny Chan
> 在 2019年4月23日 +0800 PM2:24，Андрей Цвелодуб <a....@gmail.com>，写道：
>> Hi Danny,
>> 
>> I would also agree with Julian on his position. I've tried to get around
>> this limitation in several different ways, but none of it ended well :)
>> 
>> For your idea with hints, if you have custom RelNode classes, you can add
>> hint as an additional field of the class and you can write a simple rule
>> that propagates the hint downwards, step by step. And also include the hint
>> in your cost estimation, so that nodes with hints would be more attractive
>> to the planner. I'm not sure this would be the most correct way to use the
>> cost mechanism, but at least it is straightforward and it works.
>> 
>> Best Regards,
>> Andrew Tsvelodub
>> 
>> On Tue, 23 Apr 2019 at 08:44, Yuzhao Chen <yu...@gmail.com> wrote:
>> 
>>> Julian,
>>> 
>>> I want to add hint support for Calcite, the initial idea was to tag a
>>> RelNode(transformed from a SqlNode with hint) with a hit attribute(or
>>> trait), then I hope that the children (inputs) of it can see this hint, so
>>> to make some decisions if it should consume or propagate the hint.
>>> 
>>> The problem I got here is the trait propagate from inputs from, which is
>>> the opposite as what I need, can you give some suggestions ? If I use
>>> MetadataHandler to cache and propagate the hints, how to propagate from
>>> parents to children ?
>>> 
>>> Best,
>>> Danny Chan
>>> 在 2019年4月23日 +0800 AM3:14，Julian Hyde <jh...@apache.org>，写道：
>>>> TL;DR: RelNodes don’t really have parents. Be careful if you are relying
>>> on the parent concept too much. Rely on rules instead.
>>>> 
>>>> In the Volcano model, a RelNode doesn’t really have a parent. It might
>>> be used in several places. (RelSet has a field ‘List<RelNode> parents’ that
>>> is kept up to date as planing progresses. But it’s really for Volcano’s
>>> internal use.)
>>>> 
>>>> Even if you are not using Volcano, there are reasons to want the RelNode
>>> graph to be a dag, so again, a RelNode doesn’t have a unique parent.
>>>> 
>>>> RelShuttleImpl has a stack. You can use that to find the parent. But the
>>> “parent” is just “where we came from as we traversed the RelNode graph”.
>>> There may be other “parents” that you do not know about.
>>>> 
>>>> If you have a Project and want to find all parents that are Filters,
>>> don’t even think about “iterating over the parents” of the Project. Just
>>> write a rule that matches a Filter on a Project, and trust Volcano to do
>>> its job.
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Apr 22, 2019, at 6:15 AM, Yuzhao Chen <yu...@gmail.com> wrote:
>>>>> 
>>>>> Thx, Stamatis, that somehow make sense, if i pass around the parent
>>> node every time I visit a RelNode and keep the parents in the cache, but it
>>> is still not that intuitive. Actually I what a to add a new RelTrait which
>>> bind to a specific scope, for example:
>>>>> 
>>>>> join-rel(trait1)
>>>>> / \
>>>>> join2 join3
>>>>> 
>>>>> Join-rel has a trait trait1, and I want all the children of join-rel
>>> can see this trait, with Calcite’s default metadata handler, I can only see
>>> the trait from children nodes(traits propagate from the inputs), and I have
>>> no idea how to propagate a trait reversely?
>>>>> 
>>>>> 
>>>>> Best,
>>>>> Danny Chan
>>>>> 在 2019年4月22日 +0800 PM8:44，Stamatis Zampetakis <za...@gmail.com>，写道：
>>>>>> Hi Danny,
>>>>>> 
>>>>>> Apart from RelShuttle there is also RelVisitor which has a visit
>>> method
>>>>>> that provides the parent [1]. Not sure, if it suits your needs.
>>>>>> 
>>>>>> Best,
>>>>>> Stamatis
>>>>>> 
>>>>>> [1]
>>>>>> 
>>> https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelVisitor.java#L43
>>>>>> 
>>>>>> 
>>>>>> On Mon, Apr 22, 2019 at 2:14 PM Yuzhao Chen <yu...@gmail.com>
>>> wrote:
>>>>>> 
>>>>>>> Now for RelNode, we have method getInput()[1] to fetch the input
>>>>>>> RelNodes, but how we fetch the parent ?
>>>>>>> 
>>>>>>> For example, we have plan:
>>>>>>> 
>>>>>>> join-rel
>>>>>>> / \
>>>>>>> scan1 scan2
>>>>>>> 
>>>>>>> 
>>>>>>> We can get scan1 and scan2 in join-rel directly with method
>>> getInput, but
>>>>>>> how can we get the join rel in scan1 and scan 2 ?
>>>>>>> 
>>>>>>> I know that there is a RelShuttle that can visit every RelNode and
>>> if I
>>>>>>> make a cache for the inputs mapping, finally I can get the
>>> ‘parents’ from
>>>>>>> the cache, but this is boring code and not that intuitive.
>>>>>>> 
>>>>>>> Do you guys have any good ideas ?
>>>>>>> 
>>>>>>> [1]
>>>>>>> 
>>> https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelNode.java#L132
>>>>>>> 
>>>>>>> 
>>>>>>> Best,
>>>>>>> Danny Chan
>>>>>>> 
>>>> 
>>>

Re: How to traverse RelNode’s parent conviniently?

Posted by Yuzhao Chen <yu...@gmail.com>.

Thx, Andrew

I don’t want to have a custom RelNode class, I hope all the work about hints would be contributed to the community. I want to find an acceptable way to keep and propagate the hints if we use the MetadataHandler to cache and query the hints.

I don’t think the hints should be mixed into the cost model, that would make the cost computation very complex and hard to maintain, we only need the hints in our planning phrase to give suggestions, hints is more like another guideline for me and transparent to the planner.

Best,
Danny Chan
在 2019年4月23日 +0800 PM2:24，Андрей Цвелодуб <a....@gmail.com>，写道：
> Hi Danny,
>
> I would also agree with Julian on his position. I've tried to get around
> this limitation in several different ways, but none of it ended well :)
>
> For your idea with hints, if you have custom RelNode classes, you can add
> hint as an additional field of the class and you can write a simple rule
> that propagates the hint downwards, step by step. And also include the hint
> in your cost estimation, so that nodes with hints would be more attractive
> to the planner. I'm not sure this would be the most correct way to use the
> cost mechanism, but at least it is straightforward and it works.
>
> Best Regards,
> Andrew Tsvelodub
>
> On Tue, 23 Apr 2019 at 08:44, Yuzhao Chen <yu...@gmail.com> wrote:
>
> > Julian,
> >
> > I want to add hint support for Calcite, the initial idea was to tag a
> > RelNode(transformed from a SqlNode with hint) with a hit attribute(or
> > trait), then I hope that the children (inputs) of it can see this hint, so
> > to make some decisions if it should consume or propagate the hint.
> >
> > The problem I got here is the trait propagate from inputs from, which is
> > the opposite as what I need, can you give some suggestions ? If I use
> > MetadataHandler to cache and propagate the hints, how to propagate from
> > parents to children ?
> >
> > Best,
> > Danny Chan
> > 在 2019年4月23日 +0800 AM3:14，Julian Hyde <jh...@apache.org>，写道：
> > > TL;DR: RelNodes don’t really have parents. Be careful if you are relying
> > on the parent concept too much. Rely on rules instead.
> > >
> > > In the Volcano model, a RelNode doesn’t really have a parent. It might
> > be used in several places. (RelSet has a field ‘List<RelNode> parents’ that
> > is kept up to date as planing progresses. But it’s really for Volcano’s
> > internal use.)
> > >
> > > Even if you are not using Volcano, there are reasons to want the RelNode
> > graph to be a dag, so again, a RelNode doesn’t have a unique parent.
> > >
> > > RelShuttleImpl has a stack. You can use that to find the parent. But the
> > “parent” is just “where we came from as we traversed the RelNode graph”.
> > There may be other “parents” that you do not know about.
> > >
> > > If you have a Project and want to find all parents that are Filters,
> > don’t even think about “iterating over the parents” of the Project. Just
> > write a rule that matches a Filter on a Project, and trust Volcano to do
> > its job.
> > >
> > > Julian
> > >
> > >
> > >
> > >
> > > > On Apr 22, 2019, at 6:15 AM, Yuzhao Chen <yu...@gmail.com> wrote:
> > > >
> > > > Thx, Stamatis, that somehow make sense, if i pass around the parent
> > node every time I visit a RelNode and keep the parents in the cache, but it
> > is still not that intuitive. Actually I what a to add a new RelTrait which
> > bind to a specific scope, for example:
> > > >
> > > > join-rel(trait1)
> > > > / \
> > > > join2 join3
> > > >
> > > > Join-rel has a trait trait1, and I want all the children of join-rel
> > can see this trait, with Calcite’s default metadata handler, I can only see
> > the trait from children nodes(traits propagate from the inputs), and I have
> > no idea how to propagate a trait reversely?
> > > >
> > > >
> > > > Best,
> > > > Danny Chan
> > > > 在 2019年4月22日 +0800 PM8:44，Stamatis Zampetakis <za...@gmail.com>，写道：
> > > > > Hi Danny,
> > > > >
> > > > > Apart from RelShuttle there is also RelVisitor which has a visit
> > method
> > > > > that provides the parent [1]. Not sure, if it suits your needs.
> > > > >
> > > > > Best,
> > > > > Stamatis
> > > > >
> > > > > [1]
> > > > >
> > https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelVisitor.java#L43
> > > > >
> > > > >
> > > > > On Mon, Apr 22, 2019 at 2:14 PM Yuzhao Chen <yu...@gmail.com>
> > wrote:
> > > > >
> > > > > > Now for RelNode, we have method getInput()[1] to fetch the input
> > > > > > RelNodes, but how we fetch the parent ?
> > > > > >
> > > > > > For example, we have plan:
> > > > > >
> > > > > > join-rel
> > > > > > / \
> > > > > > scan1 scan2
> > > > > >
> > > > > >
> > > > > > We can get scan1 and scan2 in join-rel directly with method
> > getInput, but
> > > > > > how can we get the join rel in scan1 and scan 2 ?
> > > > > >
> > > > > > I know that there is a RelShuttle that can visit every RelNode and
> > if I
> > > > > > make a cache for the inputs mapping, finally I can get the
> > ‘parents’ from
> > > > > > the cache, but this is boring code and not that intuitive.
> > > > > >
> > > > > > Do you guys have any good ideas ?
> > > > > >
> > > > > > [1]
> > > > > >
> > https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelNode.java#L132
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Danny Chan
> > > > > >
> > >
> >

Re: How to traverse RelNode’s parent conviniently?

Posted by Андрей Цвелодуб <a....@gmail.com>.

Hi Danny,

I would also agree with Julian on his position. I've tried to get around
this limitation in several different ways, but none of it ended well :)

For your idea with hints, if you have custom RelNode classes, you can add
hint as an additional field of the class and you can write a simple rule
that propagates the hint downwards, step by step. And also include the hint
in your cost estimation, so that nodes with hints would be more attractive
to the planner. I'm not sure this would be the most correct way to use the
cost mechanism, but at least it is straightforward and it works.

Best Regards,
Andrew Tsvelodub

On Tue, 23 Apr 2019 at 08:44, Yuzhao Chen <yu...@gmail.com> wrote:

> Julian,
>
> I want to add hint support for Calcite, the initial idea was to tag a
> RelNode(transformed from a SqlNode with hint) with a hit attribute(or
> trait), then I hope that the children (inputs) of it can see this hint, so
> to make some decisions if it should consume or propagate the hint.
>
> The problem I got here is the trait propagate from inputs from, which is
> the opposite as what I need, can you give some suggestions ? If I use
> MetadataHandler to cache and propagate the hints, how to propagate from
> parents to children ?
>
> Best,
> Danny Chan
> 在 2019年4月23日 +0800 AM3:14，Julian Hyde <jh...@apache.org>，写道：
> > TL;DR: RelNodes don’t really have parents. Be careful if you are relying
> on the parent concept too much. Rely on rules instead.
> >
> > In the Volcano model, a RelNode doesn’t really have a parent. It might
> be used in several places. (RelSet has a field ‘List<RelNode> parents’ that
> is kept up to date as planing progresses. But it’s really for Volcano’s
> internal use.)
> >
> > Even if you are not using Volcano, there are reasons to want the RelNode
> graph to be a dag, so again, a RelNode doesn’t have a unique parent.
> >
> > RelShuttleImpl has a stack. You can use that to find the parent. But the
> “parent” is just “where we came from as we traversed the RelNode graph”.
> There may be other “parents” that you do not know about.
> >
> > If you have a Project and want to find all parents that are Filters,
> don’t even think about “iterating over the parents” of the Project. Just
> write a rule that matches a Filter on a Project, and trust Volcano to do
> its job.
> >
> > Julian
> >
> >
> >
> >
> > > On Apr 22, 2019, at 6:15 AM, Yuzhao Chen <yu...@gmail.com> wrote:
> > >
> > > Thx, Stamatis, that somehow make sense, if i pass around the parent
> node every time I visit a RelNode and keep the parents in the cache, but it
> is still not that intuitive. Actually I what a to add a new RelTrait which
> bind to a specific scope, for example:
> > >
> > > join-rel(trait1)
> > > / \
> > > join2 join3
> > >
> > > Join-rel has a trait trait1, and I want all the children of join-rel
> can see this trait, with Calcite’s default metadata handler, I can only see
> the trait from children nodes(traits propagate from the inputs), and I have
> no idea how to propagate a trait reversely?
> > >
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年4月22日 +0800 PM8:44，Stamatis Zampetakis <za...@gmail.com>，写道：
> > > > Hi Danny,
> > > >
> > > > Apart from RelShuttle there is also RelVisitor which has a visit
> method
> > > > that provides the parent [1]. Not sure, if it suits your needs.
> > > >
> > > > Best,
> > > > Stamatis
> > > >
> > > > [1]
> > > >
> https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelVisitor.java#L43
> > > >
> > > >
> > > > On Mon, Apr 22, 2019 at 2:14 PM Yuzhao Chen <yu...@gmail.com>
> wrote:
> > > >
> > > > > Now for RelNode, we have method getInput()[1] to fetch the input
> > > > > RelNodes, but how we fetch the parent ?
> > > > >
> > > > > For example, we have plan:
> > > > >
> > > > > join-rel
> > > > > / \
> > > > > scan1 scan2
> > > > >
> > > > >
> > > > > We can get scan1 and scan2 in join-rel directly with method
> getInput, but
> > > > > how can we get the join rel in scan1 and scan 2 ?
> > > > >
> > > > > I know that there is a RelShuttle that can visit every RelNode and
> if I
> > > > > make a cache for the inputs mapping, finally I can get the
> ‘parents’ from
> > > > > the cache, but this is boring code and not that intuitive.
> > > > >
> > > > > Do you guys have any good ideas ?
> > > > >
> > > > > [1]
> > > > >
> https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelNode.java#L132
> > > > >
> > > > >
> > > > > Best,
> > > > > Danny Chan
> > > > >
> >
>

Re: How to traverse RelNode’s parent conviniently?

Posted by Yuzhao Chen <yu...@gmail.com>.

Julian,

I want to add hint support for Calcite, the initial idea was to tag a RelNode(transformed from a SqlNode with hint) with a hit attribute(or trait), then I hope that the children (inputs) of it can see this hint, so to make some decisions if it should consume or propagate the hint.

The problem I got here is the trait propagate from inputs from, which is the opposite as what I need, can you give some suggestions ? If I use MetadataHandler to cache and propagate the hints, how to propagate from parents to children ?

Best,
Danny Chan
在 2019年4月23日 +0800 AM3:14，Julian Hyde <jh...@apache.org>，写道：
> TL;DR: RelNodes don’t really have parents. Be careful if you are relying on the parent concept too much. Rely on rules instead.
>
> In the Volcano model, a RelNode doesn’t really have a parent. It might be used in several places. (RelSet has a field ‘List<RelNode> parents’ that is kept up to date as planing progresses. But it’s really for Volcano’s internal use.)
>
> Even if you are not using Volcano, there are reasons to want the RelNode graph to be a dag, so again, a RelNode doesn’t have a unique parent.
>
> RelShuttleImpl has a stack. You can use that to find the parent. But the “parent” is just “where we came from as we traversed the RelNode graph”. There may be other “parents” that you do not know about.
>
> If you have a Project and want to find all parents that are Filters, don’t even think about “iterating over the parents” of the Project. Just write a rule that matches a Filter on a Project, and trust Volcano to do its job.
>
> Julian
>
>
>
>
> > On Apr 22, 2019, at 6:15 AM, Yuzhao Chen <yu...@gmail.com> wrote:
> >
> > Thx, Stamatis, that somehow make sense, if i pass around the parent node every time I visit a RelNode and keep the parents in the cache, but it is still not that intuitive. Actually I what a to add a new RelTrait which bind to a specific scope, for example:
> >
> > join-rel(trait1)
> > / \
> > join2 join3
> >
> > Join-rel has a trait trait1, and I want all the children of join-rel can see this trait, with Calcite’s default metadata handler, I can only see the trait from children nodes(traits propagate from the inputs), and I have no idea how to propagate a trait reversely?
> >
> >
> > Best,
> > Danny Chan
> > 在 2019年4月22日 +0800 PM8:44，Stamatis Zampetakis <za...@gmail.com>，写道：
> > > Hi Danny,
> > >
> > > Apart from RelShuttle there is also RelVisitor which has a visit method
> > > that provides the parent [1]. Not sure, if it suits your needs.
> > >
> > > Best,
> > > Stamatis
> > >
> > > [1]
> > > https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelVisitor.java#L43
> > >
> > >
> > > On Mon, Apr 22, 2019 at 2:14 PM Yuzhao Chen <yu...@gmail.com> wrote:
> > >
> > > > Now for RelNode, we have method getInput()[1] to fetch the input
> > > > RelNodes, but how we fetch the parent ?
> > > >
> > > > For example, we have plan:
> > > >
> > > > join-rel
> > > > / \
> > > > scan1 scan2
> > > >
> > > >
> > > > We can get scan1 and scan2 in join-rel directly with method getInput, but
> > > > how can we get the join rel in scan1 and scan 2 ?
> > > >
> > > > I know that there is a RelShuttle that can visit every RelNode and if I
> > > > make a cache for the inputs mapping, finally I can get the ‘parents’ from
> > > > the cache, but this is boring code and not that intuitive.
> > > >
> > > > Do you guys have any good ideas ?
> > > >
> > > > [1]
> > > > https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelNode.java#L132
> > > >
> > > >
> > > > Best,
> > > > Danny Chan
> > > >
>

Re: How to traverse RelNode’s parent conviniently?

Posted by Julian Hyde <jh...@apache.org>.

TL;DR: RelNodes don’t really have parents. Be careful if you are relying on the parent concept too much. Rely on rules instead.

In the Volcano model, a RelNode doesn’t really have a parent. It might be used in several places. (RelSet has a field ‘List<RelNode> parents’ that is kept up to date as planing progresses. But it’s really for Volcano’s internal use.)

Even if you are not using Volcano, there are reasons to want the RelNode graph to be a dag, so again, a RelNode doesn’t have a unique parent.

RelShuttleImpl has a stack. You can use that to find the parent. But the “parent” is just “where we came from as we traversed the RelNode graph”. There may be other “parents” that you do not know about.

If you have a Project and want to find all parents that are Filters, don’t even think about “iterating over the parents” of the Project. Just write a rule that matches a Filter on a Project, and trust Volcano to do its job.

Julian

> On Apr 22, 2019, at 6:15 AM, Yuzhao Chen <yu...@gmail.com> wrote:
> 
> Thx, Stamatis, that somehow make sense, if i pass around the parent node every time I visit a RelNode and keep the parents in the cache, but it is still not that intuitive. Actually I what a to add a new RelTrait which bind to a specific scope, for example:
> 
>          join-rel(trait1)
>        /           \
>     join2     join3
> 
> Join-rel has a trait trait1, and I want all the children of join-rel can see this trait, with Calcite’s default metadata handler, I can only see the trait from children nodes(traits propagate from the inputs), and I have no idea how to propagate a trait reversely?
> 
> 
> Best,
> Danny Chan
> 在 2019年4月22日 +0800 PM8:44，Stamatis Zampetakis <za...@gmail.com>，写道：
>> Hi Danny,
>> 
>> Apart from RelShuttle there is also RelVisitor which has a visit method
>> that provides the parent [1]. Not sure, if it suits your needs.
>> 
>> Best,
>> Stamatis
>> 
>> [1]
>> https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelVisitor.java#L43
>> 
>> 
>> On Mon, Apr 22, 2019 at 2:14 PM Yuzhao Chen <yu...@gmail.com> wrote:
>> 
>>> Now for RelNode, we have method getInput()[1] to fetch the input
>>> RelNodes, but how we fetch the parent ?
>>> 
>>> For example, we have plan:
>>> 
>>> join-rel
>>> / \
>>> scan1 scan2
>>> 
>>> 
>>> We can get scan1 and scan2 in join-rel directly with method getInput, but
>>> how can we get the join rel in scan1 and scan 2 ?
>>> 
>>> I know that there is a RelShuttle that can visit every RelNode and if I
>>> make a cache for the inputs mapping, finally I can get the ‘parents’ from
>>> the cache, but this is boring code and not that intuitive.
>>> 
>>> Do you guys have any good ideas ?
>>> 
>>> [1]
>>> https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelNode.java#L132
>>> 
>>> 
>>> Best,
>>> Danny Chan
>>>

Re: How to traverse RelNode’s parent conviniently?

Posted by Yuzhao Chen <yu...@gmail.com>.

Thx, Stamatis, that somehow make sense, if i pass around the parent node every time I visit a RelNode and keep the parents in the cache, but it is still not that intuitive. Actually I what a to add a new RelTrait which bind to a specific scope, for example:

         join-rel(trait1)
       /           \
    join2     join3

Join-rel has a trait trait1, and I want all the children of join-rel can see this trait, with Calcite’s default metadata handler, I can only see the trait from children nodes(traits propagate from the inputs), and I have no idea how to propagate a trait reversely?


Best,
Danny Chan
在 2019年4月22日 +0800 PM8:44，Stamatis Zampetakis <za...@gmail.com>，写道：
> Hi Danny,
>
> Apart from RelShuttle there is also RelVisitor which has a visit method
> that provides the parent [1]. Not sure, if it suits your needs.
>
> Best,
> Stamatis
>
> [1]
> https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelVisitor.java#L43
>
>
> On Mon, Apr 22, 2019 at 2:14 PM Yuzhao Chen <yu...@gmail.com> wrote:
>
> > Now for RelNode, we have method getInput()[1] to fetch the input
> > RelNodes, but how we fetch the parent ?
> >
> > For example, we have plan:
> >
> > join-rel
> > / \
> > scan1 scan2
> >
> >
> > We can get scan1 and scan2 in join-rel directly with method getInput, but
> > how can we get the join rel in scan1 and scan 2 ?
> >
> > I know that there is a RelShuttle that can visit every RelNode and if I
> > make a cache for the inputs mapping, finally I can get the ‘parents’ from
> > the cache, but this is boring code and not that intuitive.
> >
> > Do you guys have any good ideas ?
> >
> > [1]
> > https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelNode.java#L132
> >
> >
> > Best,
> > Danny Chan
> >

Re: How to traverse RelNode’s parent conviniently?

Posted by Stamatis Zampetakis <za...@gmail.com>.

Hi Danny,

Apart from RelShuttle there is also RelVisitor which has a visit method
that provides the parent [1]. Not sure, if it suits your needs.

Best,
Stamatis

[1]
https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelVisitor.java#L43


On Mon, Apr 22, 2019 at 2:14 PM Yuzhao Chen <yu...@gmail.com> wrote:

> Now for RelNode, we have method getInput()[1]  to fetch the input
> RelNodes, but how we fetch the parent ?
>
> For example, we have plan:
>
>       join-rel
>     /             \
> scan1     scan2
>
>
> We can get scan1 and scan2 in join-rel directly with  method getInput, but
> how can we get the join rel in scan1 and scan 2 ?
>
> I know that there is a RelShuttle that can visit every RelNode and if I
> make a cache for the inputs mapping, finally I can get the ‘parents’ from
> the cache, but this is boring code and not that intuitive.
>
> Do you guys have any good ideas ?
>
> [1]
> https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelNode.java#L132
>
>
> Best,
> Danny Chan
>