You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Haisheng Yuan <hy...@apache.org> on 2022/03/07 23:58:30 UTC

Re: Allow Cascades driver invoking "derive" on the nodes produced by "passThrough"

Hi Vladimir,

If I understand it correctly, your concern is still about [1], right?
If it is, I think I have answered it in [2], perhaps it didn't jump into your inbox.

I am still not convinced that enforcing derivation on all rels is the only way to solve your concern.

If it is the other problem, can you give us a concrete example? So we can discuss and solve it.

Thanks,
Haisheng Yuan

[1] https://lists.apache.org/thread/4rcvlk1oprbnbgbnwl2s735p99h4vj80
[2] https://lists.apache.org/thread/s17tsj3pmccfrfvydnmv76btc3voxohj

On 2022/02/13 09:40:05 Roman Kondakov wrote:
> Hi Alessandro,
> 
> this problem was already discussed on dev-list [1] and we have a ticket 
> for this [2].
> 
> My concern is that many projects use Calcite as a Lego kit: they took 
> internal components of Calcite and combine them for building a custom 
> planning and execution pipeline. And sometimes downstream projects need 
> to change the default behavior of internal components to fit their 
> requirements or overcome the bug. So the idea of keeping even internal 
> components of Calcite "more public" is rather a good thing than the bad 
> one from my point of view.
> 
> Thank you.
> 
> [1] https://lists.apache.org/thread/cykl74dcphgow4790fwoc8frsjglz7n1
> 
> [2] https://issues.apache.org/jira/browse/CALCITE-4542
> 
> --
> Roman Kondakov
> 
> 
> On 11.02.2022 19:15, Alessandro Solimando wrote:
> > Hello everyone,
> > @Vladimir, +1 on the change introducing "enforceDerive()".
> >
> > @Roman, could you walk us through the limitations you found that forced you
> > to copy-paste the whole class?
> >
> > Maybe there is some middle ground for your problem(s) too, similar in
> > spirit to what Vladimir proposed for the other limitation.
> >
> > I am not against making the class more public if necessary, but it would be
> > nice to have a discussion here before going down that path.
> > If the discussion leads to a better design of the original class, all
> > projects would benefit from that.
> >
> > Best regards,
> > Alessandro
> >
> > On Fri, 11 Feb 2022 at 04:14, Roman Kondakov <ko...@mail.ru.invalid>
> > wrote:
> >
> >> Hi Vladimir,
> >>
> >> +1 for making the rule driver more public. We've faced similar problems
> >> in the downstream project. The solution was to copy and paste the
> >> TopDownRuleDrive code with small fixes since it was not possible to
> >> override the default behavior.
> >>
> >> --
> >> Roman Kondakov
> >>
> >>
> >> On 11.02.2022 02:50, Vladimir Ozerov wrote:
> >>> Hi,
> >>>
> >>> In the Cascades driver, it is possible to propagate the requests top-down
> >>> using the "passThrough", method and then notify parents bottom-up about
> >> the
> >>> concrete physical implementations of inputs using the "derive" method.
> >>>
> >>> In some optimizers, the valid parent node cannot be created before the
> >>> trait sets of inputs are known. An example is a custom distribution trait
> >>> that includes the number of shards in the system. The parent operator
> >> alone
> >>> may guess the distribution keys, but cannot know the number of input
> >>> shards. To mitigate this, you may create a "template" node with an
> >> infinite
> >>> cost from within the optimization rule that will propagate the
> >>> passThrough/drive calls but would never participate in the final plan.
> >>>
> >>> Currency, the top-down driver designed in a way that the nodes created
> >> from
> >>> the "passThrough" method are not notified on the "derive" stage. This
> >> leads
> >>> to the incomplete exploration of the search space. For example, the rule
> >>> may produce the node "A1.template" that will be converted into a normal
> >>> "A1" node in the derive phase. However, if the parent operator produced
> >>> "A2.template" from "A1.template" using pass-through mechanics, the
> >>> "A2.template" will never be notified about the concrete input traits,
> >>> possibly losing the optimal plan. This is especially painful in
> >> distributed
> >>> engines, where the number of shards is important for the placement of
> >>> Shuffle operators.
> >>>
> >>> It seems that the problem could be solved with relatively low effort. The
> >>> "derive" is not invoked on the nodes created from the "passThrough"
> >> method,
> >>> because such nodes are placed in the "passThroughCache" collection.
> >> Instead
> >>> of doing this unconditionally, we may introduce an additional predicate
> >>> that would selectively enforce "derive" on such nodes. For example, this
> >>> could be a default method in the PhysicalNode interface, like:
> >>>
> >>> interface PhysicalNode {
> >>>     default boolean enforceDerive() { return false; }
> >>> }
> >>>
> >>> If there are no objections, I'll proceed with this change.
> >>>
> >>> Alternatively, we may make the TopDownRuleDriver more "public", so that
> >> the
> >>> user can extend it and decide within the driver whether to cache a
> >>> particular node or not.
> >>>
> >>> I would appreciate your feedback on the matter.
> >>>
> >>> Regards,
> >>> Vladimir.
> >>>
>