You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Haisheng Yuan <h....@alibaba-inc.com> on 2019/10/18 02:55:57 UTC

[DISCUSS] On-demand traitset request

TL;DR
Both top-down physical TraitSet request and bottom-up TraitSet
derivation have their strongth and weakness, we propose 
on-demand TraitSet request to combine the above two, to reduce
the number of plan alternatives that are genereated, especially 
in distributed system.

e.g.
select * from foo join bar on f1=b1 and f2=b2 and f3=b3;

In non-distributed system, we can generate a sort merge join, 
requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.  
But if foo happens to be sorted by f3,f2,f1, we may miss the 
chance of making use of the delivered ordering of foo. Because
if we require bar to be sorted by b3,b2,b1, we don't need to
sort on foo anymore. There are so many choices, n!, not even
considering asc/desc and null direction. We can't request all
the possible traitsets in top-down way, and can't derive all the
possible traitsets in bottom-up way either.

We propose on-demand traitset request by adding a new type
of metadata DerivedTraitSets into the built-in metadata system.

List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)

In this metadata, every operator returns several possbile traitsets
that may be derived from this operator.

Using above query as an example, the tablescan on foo should
return traiset with collation on f3, f2, f1.

In physical implementation rules, e.g. the SortMergeJoinRule,
it gets possible traitsets from both child operators, uses the join
keys to eliminate useless traitsets, leaves out usefull traitsets,
and requests corresponding traitset on the other child.

This relies on the feature of AbstractConverter, which is turned
off by default, due to performance issue [1].

Thoughts?

[1] https://issues.apache.org/jira/browse/CALCITE-2970

Haisheng

Re: Re: Re: [DISCUSS] On-demand traitset request

Posted by Haisheng Yuan <h....@alibaba-inc.com>.

It can be extended to other traits easily. The APIs for distribution and collation are for convenience, as all the databases have these traits, for single-node database, distribution can just be ANY.
public <T extends RelTrait> T requiredTrait(RelTraitDef<T> traitDef, RelTrait required, int child, int optReqId)
public <T extends RelTrait> T derivedTrait(RelTraitDef<T> traitDef)
- Haisheng

------------------------------------------------------------------
发件人：Stamatis Zampetakis<za...@gmail.com>
日　期：2019年10月23日 14:53:38
收件人：<de...@calcite.apache.org>
主　题：Re: Re: [DISCUSS] On-demand traitset request

Overall, I agree that better encapsulation of propagation and derivation of
traits would be beneficial for our system.

Regarding the API proposed by Haisheng, I have to think a bit more on it.
At first glance, adding such methods directly in the RelNode API does not
appear an ideal solution since I don't see how easily it can be extended to
support other kinds of traits.

Best,
Stamatis

On Mon, Oct 21, 2019 at 7:31 AM Haisheng Yuan <h....@alibaba-inc.com>
wrote:

> To Stamatis,
> Not exactly. My initial thought was giving the physical operator the
> abiity to customize and fully control physical property derivation
> strategy, thus can further help the purpose driven trait request. But since
> we agree to think more high-level API to support on-demand traitset
> request, I will illustrate what API is expected from implentator's
> perspective.
>
> Jingfeng gave us basic steps on how the plan might be generated using
> top-down purpose driven only manner, I think differently with the first
> several steps.
>
> SELECT DISTINCT c, b FROM
> ( SELECT R.c c, S.b b FROM R, S
> WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
>
> Aggregate . (c, b)
> +--- MergeJoin . (a, b, c)
> |--- TableScan on R
> +-- TableScan on S
>
> 1. Aggreate require collation (c,b) from its child, not permutation.
> 2. MergeJoin's parent require (c,b), it has 2 options. Pass it down, or
> ignore it.
> a) Pass down. it has join condition on (a,b,c), the required columns
> can be coverd by join condition columns, so MergeJoin will try to deliver
> (c,b,a), and both children must exact match. Then we will have sort on both
> children of MergeJoin.
> b) Ignore it. Require its first child collation on (a,b,c), but
> matching type is subset. R delivers (c,b,a). Then using the first child's
> derived collation trait to require its second child to exact match. Thus we
> have a sort on S, and a sort on top of MergeJoin.
>
> Both plan might be good or bad. If R, S are large, but the join result is
> small, plan b) might be better, otherwise plan a) might be better.
>
> Anyway, I hope the physical operators can have full control the physical
> properties requests and derivation, in physical operator class itself, not
> rules, not other places.
>
> Per our experience, we have spent too much time on writing code for
> dealing with all kinds of property requirement and derivation. But in fact,
> life should be easier. I would like to the physical operator provides the
> following API, and the 3rd party implementator just need to
> override/implement them, no more need to be taken care.
>
> 1. void setDistributionRequests(int numReq)
> Each operator can specify how many optimzation requests on some trait it
> want to do. e.g. HashJoin may request the following distribution on both
> children:
> - (hash distribution on key1, hash distribution on key1)
> - (hash distribution on key2, hash distribution on key2)
> - (hash distribution on all keys, hash distribution on all keys)
> - (Any, Broadcast)
> - (Gather, Gather)
>
> 2. RelDistribution requiredDistribution(RelDistribution required, int
> child) //same for collation
> Given the required distribution from parent operator, returns the required
> distribution for its nth child.
>
> 3. RelDistribution derivedDistribution() //same for collation
> Derive the distribution of the operator itelf from child operators.
>
> 4. MatchType distributionMatchType(int child) //same for collation
> Returns the distribution match type for its nth child, how does it match
> the other children.
> Similar with Jinfeng's point, I think there should be 3 types of matching:
> exact, satisfy, subset.
> e.g.
> R is distributed by (a), S is distributed by (a,b)
> select * from R join S using a,b,c
> If we have plan
> HashJoin
> |-- TableScan on R
> +-- TableScan on S
> We may require the match type on S to be satisfy. (a,b) satisfies required
> distribution (a,b,c).
> Fot the outer child R, we require it to be exact match with inner.
>
> 5. ExecOrder getExecOrder()
> Returns how the operator's children is executed, left to right, or right
> to left. Typically, hash join is right to left. We might use this as the
> optimization order. To make sure we have correct plans, we have to optimize
> child and enforce properties in the order that is specific to the physical
> operator.
> All the other dirty work should be done by the optimization engine, but
> not through rules, I believe. However, I havn't got any clear plan on how
> to achieve it inside the engine.
>
> Haisheng
>
> ------------------------------------------------------------------
> 发件人：Jacques Nadeau<ja...@apache.org>
> 日 期：2019年10月21日 11:04:19
> 收件人：<de...@calcite.apache.org>
> 主 题：Re: [DISCUSS] On-demand traitset request
>
> Definitely agree that this has been a long time missing. I've been
> challenged by this absence since before Calcite was Calcite. I also
> remember the trials and tribulations around this that Jinfeng references
> above.
>
> In general, I think the first thing one might want to before actually doing
> this is to make trait derivation internally defined based on the impact
> that a rel node has on traits. I've always found the externally provided
> rel traits to be problematic and a potential place for hidden bugs (row
> type has the same problem) . It means that trait derivation of a relnode is
> based on the rules that do transformation as opposed to the "physical"
> impact of the relnode. (It also leads to derivation behavior for a relnode
> being scattered in many different rules.) If moved to the rel node, it also
> provides a second benefit, once you encapsulate this propagation logic, you
> could also expose this as a trait derivation function that the planner
> could use to seek out derivation paths.
>
> At Dremio we toyed last year with the idea of adding a heuristic cycle on
> top of the existing volano planner and relset state. In this model a
> RelNode would have two additional methods: it would expose a trait
> propagation function (as described above) and optionally expose one or more
> specific traits this node desired. When the planner arrived at a
> conclusion, you'd run the heuristic cycle to further propagate desired
> traits (if possible) and then restart the planning cycle based on any new
> transformations done during the heuristic stage. You'd then repeat this
> volcano/trait prop cycle until you arrive at a "completed" state.
>
> We never actually got to implementation but I'm super supportive of someone
> picking this up.
>
>
>
> On Sat, Oct 19, 2019 at 12:25 AM Stamatis Zampetakis <za...@gmail.com>
> wrote:
>
> > Thanks all for the very interesting usecases and helpful examples.
> >
> > I would like to stay a bit on the fact that logical operators do not have
> > physical traits. Calcite's logical operators do have at least one
> physical
> > trait which is Convention.NONE. Other logical operators such as:
> >
> > LogicalTableScan [1]
> > LogicalFilter [2]
> > LogicalProject [3]
> > LogicalWindow [4]
> >
> > have additional traits regarding collation and distribution. There is
> > already some sort of trait derivation so to some extend it is possible to
> > check the traitset of the child (logical) operator before requesting some
> > other traitset when creating the parent (physical).
> >
> > I see that this mechanism of adding explicitly traits to logical
> operators
> > may be confusing and may also lead to planning problems. Replacing it by
> > metadata might be a good idea and it is closer to the idea of
> > "applicability function" mentioned in the Volcano paper. Assuming that we
> > follow this approach I would assume that the traitset of logical
> operators
> > from now on should be always empty.
> >
> > Is this what you have in mind Haisheng?
> >
> > Best,
> > Stamatis
> >
> > [1]
> >
> >
> https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalTableScan.java#L95
> > [2]
> >
> >
> https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalFilter.java#L105
> > [3]
> >
> >
> https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalProject.java#L104
> > [4]
> >
> >
> https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalWindow.java#L95
> >
> > On Sat, Oct 19, 2019 at 7:39 AM Xiening Dai <xn...@gmail.com> wrote:
> >
> > > Thanks for the sharing. I like the way you model this problem, Jinfeng.
> > >
> > > There’s one minor issue with your example. Let say if R and S doesn’t
> > have
> > > sorting properties at all. In your case, we would end up adding
> enforcers
> > > for LHS and RHS to get collation (a, b, c). Then we would need another
> > > enforcer to get collation (b, c). This is a sub optimal plan as we
> could
> > > have use (b, c, a) for join.
> > >
> > > I think in step #2, the join operator would need to take the agg trait
> > > requirement into account. Then it would have two options -
> > >
> > > 1) require *exact/super* match of (b, c, a) or (c, b, a); this is to
> > > guarantee the join output would deliver the collation agg needs.
> > > 2) require permutation match of (a, b, c); in such case, an enforcer
> > might
> > > be needed for aggregation.
> > >
> > > Eventually the cost model decides who is the winner.
> > >
> > > There’s a fundamental difference between your model and Haisheng’s
> > > proposal. In Haisheng’s case, a rel node not only looks at its parent’s
> > > requirement, but also tries to get the potential traits its input could
> > > deliver. It would try to align them to eliminate unnecessary
> > alternatives.
> > >
> > > In above example, assuming R is (b, c, a) and S is (a, b, c), to
> > implement
> > > option 1), we would generate two alternatives -
> > >
> > > MergeJoin (b, c, a)
> > > TableScan R
> > > Sort(b, c, a)
> > > TableScan S
> > >
> > > MergeJoin(c, b, a)
> > > Sort(c, b, a)
> > > TableScan R
> > > Sort(c, b, a)
> > > TableScan S
> > >
> > > But if we look at the input traits and has the insight that R already
> > > delivers (b, c, a), we could decide to require (b, c, a) only and avoid
> > > generating the 2nd plan, which is definitely worse, and reduce the
> search
> > > space.
> > >
> > >
> > > > On Oct 18, 2019, at 4:57 PM, Jinfeng Ni <jn...@apache.org> wrote:
> > > >
> > > > A little bit of history. In Drill, when we first implemented
> > > > Distribution trait's definition, we allows both exact match and
> > > > partial match in satisfy() method. This works fine for single-input
> > > > operator such aggregation, however it leads to incorrect plan for
> join
> > > > query, i.e LHS shuffle with (a, b), RHS shuffle with (a) . At that
> > > > time, we removed partial match, and use exact match only. Yet this
> > > > changes leads to unnecessary additional exchange. To mitigate this
> > > > problem, in join physical operator, for a join key (a, b, c), we
> > > > enumerate different distribution requests, yet this lead to more
> space
> > > > to explore and significantly increase planning time (which is
> probably
> > > > what Haisheng also experienced). When I look back, I feel probably
> > > > what we miss is the "coordination" step in the join operator, because
> > > > if we relax the requirement of satisfy(), for multi-input operators,
> > > > we have to enforce some "coordination", to make sure multiple input's
> > > > trait could work together properly.
> > > >
> > > >
> > > >
> > > > On Fri, Oct 18, 2019 at 4:38 PM Jinfeng Ni <jn...@apache.org> wrote:
> > > >>
> > > >> This is an interesting topic. Thanks for bringing up this issue.
> > > >>
> > > >> My understanding of Volcano planner is it works in a top-down search
> > > >> mode (the parent asks for certain trait of its child), while the
> trait
> > > >> propagates in a bottom-up way, as Stamatis explained.
> > > >>
> > > >> IMHO, the issue comes down to the definition of RelTrait, how to
> > > >> determine if a trait A could satisfy a request asking for trait B,
> > > >> that is, how RelTrait.satisfies() method is implemented.
> > > >>
> > > >> Let's first clarify different situations, using collation as
> example.
> > > >> 1) The collation is requested by query's outmost ORDER BY clause.
> > > >> - The generated plan has to have "exact match", i.e same collation
> > > >> (same column sequence), or "super match" .
> > > >> exact match: (a, b) satisfy (a, b)
> > > >> super match: (a, b, c) satisfy (a, b)
> > > >>
> > > >> 2) The collation is requested by operand with single input, such as
> > > >> sort-based Aggregation.
> > > >> - In such case, a "permutation match" is sufficient.
> > > >> For instance, for Aggregation (b,c), input with collation (c, b)
> > > >> could satisfy the requirement.
> > > >> permutation match: (b, c) satisfy (c, b). (c, b) satisfy (c,
> > b)
> > > >> permutation match: (b, c, a) satisfy (c, b). (c, b, a) satisfy
> > (c,
> > > b)
> > > >>
> > > >> 3) The collation is requested by operand with >= 2 inputs, such as
> > > >> sort-based MergeJoin.
> > > >> - A permutation match is sufficient for each input
> > > >> - MergeJoin has to do coordination, after input's trait propagates
> > > >> upwards. In other words, ensure both inputs's permutation match are
> > > >> actually same sequence. Otherwise, enforcer could be inserted upon
> > > >> each input, and the planner generates two plans and let the cost
> > > >> decide.
> > > >>
> > > >> For the first case, this is how today's RelCollation's satisfy()
> > > >> method is implemented.
> > > >>
> > > >> For the second / third cases, use Haisheng's example,
> > > >>
> > > >> SELECT DISTINCT c, b FROM
> > > >> ( SELECT R.c c, S.b b FROM R, S
> > > >> WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> > > >>
> > > >> Aggregate . (c, b)
> > > >> +--- MergeJoin . (a, b, c)
> > > >> |--- TableScan on R
> > > >> +--- TableScan on S
> > > >>
> > > >> Here is the steps that might take place in the planner:
> > > >>
> > > >> 1) Aggregate request permutation match collation (c, b)
> > > >> 2) MergeJoin request a permutation match of (a, b,c) on both it's
> > input
> > > >> 3) R respond with collation (c, b, a), which satisfy MergeJoin's LHS
> > > requirement
> > > >> 4) S respond with collation (b, c, a), which satisfy MergeJoins' RHS
> > > requirement
> > > >> 5) MergeJoin do a coordination o LHS, RHS, and generate two possible
> > > plans
> > > >> MJ1: Insert a sort of (c, b, a) on RHS. This MJ operator now has
> > > >> collation of (c, b, a)
> > > >> MJ2: Insert a sort of (b, c, a) on LHS. This MJ operator now has
> > > >> collation of (b, c, a)
> > > >> 6) MJ1 and MJ2 could both satisfy permutation match request in step
> > > >> 1, leading to two possible plans:
> > > >> Agg1: with input of MJ1
> > > >> Agg2: with input of MJ2
> > > >> 7) planner chooses a best plan based on cost of Agg1 and Agg2.
> > > >>
> > > >> I should point that the enforcer sort inserted in step 5 could help
> > > >> remove redundant sort in its input, if the input's collation is
> > > >> obtained from sort, by invoking Calcite's SortRemove Rule.
> > > >>
> > > >> The above only considers the column sequence. The DESC/ASC, NULL
> > > >> FIRST/LAST will add more complexity, but we probably use similar
> idea.
> > > >>
> > > >> In summary, we need :
> > > >> 1) redefine collation trait's satisfy() policy, exact match, super
> > > >> match, permutation match,
> > > >> 2) different physical operator applies different trait matching
> > > >> policy, depending on operator's # of inputs, and algorithm
> > > >> implementation.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Fri, Oct 18, 2019 at 2:51 PM Haisheng Yuan <
> h.yuan@alibaba-inc.com
> > >
> > > wrote:
> > > >>>
> > > >>> Hi Stamatis,
> > > >>>
> > > >>> Thanks for your comment. I think my example didn't make it clear.
> > > >>>
> > > >>> When a logical operator is created, it doesn't have any physical,
> > > >>> propertyand it shouldn't have. When a physical operator is created,
> > > >>> e.g. in Enumerable convention, it only creates an intuitive
> traitset
> > > >>> with it, and requests it children the corresponding ones.
> > > >>>
> > > >>> For operators such as Join, Aggregate, Window, which may deliver
> > > >>> multiple different traitsets, when the parent operator is created
> and
> > > >>> request its traitset, it might be good to know what are the
> poosible
> > > >>> traitset that the child operator can deliver. e.g.
> > > >>>
> > > >>> SELECT DISTINCT c, b FROM
> > > >>> ( SELECT R.c c, S.b b FROM R, S
> > > >>> WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> > > >>>
> > > >>> Suppose R is ordered by (c, b, a), and S is ordered by (b, c, a).
> > > >>> Here is the logical plan:
> > > >>> Aggregate
> > > >>> +--- InnerJoin
> > > >>> |--- TableScan on R
> > > >>> +--- TableScan on S
> > > >>>
> > > >>> When we create a physical merge join for the inner join, it may
> just
> > > >>> have collation sorted on a,b,c. Then the aggreate on top of join
> will
> > > >>> request another sort on c,b, thus we miss the best plan. What we
> > > >>> can do is requesting all the order combinations, which is n!, like
> > > >>> how the Values operator does. But that is too much.
> > > >>>
> > > >>> If we can provide an approach that can minimize the possiple
> traitset
> > > >>> that the child operator may deliver, we can reduce the chance of
> > > missing
> > > >>> good plans. For the above query, the Aggregate operator can derive
> > > >>> possible traitsets that its child operator join can deliver, in
> which
> > > case,
> > > >>> the possiple traitsets of join is
> > > >>> 1. collation on (a,b,c) based on join condition,
> > > >>> 2. collation on (c,b,a) based on left child,
> > > >>> 3. collation on (b,c,a) based on right child
> > > >>> So we can request Aggregate sorted by (c,b) and Join sorted by
> > (c,b,a).
> > > >>> The number of traiset requests and plan alternatives can be
> reduced.
> > > >>> The DerivedTraitSets can be used to derive the possible traitsets
> > from
> > > >>> Join, and pass through Project, Filter etc...
> > > >>>
> > > >>> This is just an example of non-distributed system, for distributed
> > > system,
> > > >>> it can save much more by considering the possible distribution
> > > delivered
> > > >>> by child operators.
> > > >>>
> > > >>> One thing that concerns me is it highly relies on the traiset
> system
> > > of the
> > > >>> underlying physical system. Like Enumerable doesn't consider
> > > distribution,
> > > >>> because it is single-node system, but Hive/Flink are distributed
> > > system.
> > > >>> - Haisheng
> > > >>>
> > > >>> ------------------------------------------------------------------
> > > >>> 发件人：Stamatis Zampetakis<za...@gmail.com>
> > > >>> 日 期：2019年10月18日 14:53:41
> > > >>> 收件人：<de...@calcite.apache.org>
> > > >>> 主 题：Re: [DISCUSS] On-demand traitset request
> > > >>>
> > > >>> Hi Haisheng,
> > > >>>
> > > >>> This is an interesting topic but somehow in my mind I thought that
> > this
> > > >>> mechanism is already in place.
> > > >>>
> > > >>> When an operator (logical or physical) is created its traitset is
> > > >>> determined in bottom-up fashion using the create
> > > >>> static factory method present in almost all operators. In my mind
> > this
> > > is
> > > >>> in some sense the applicability function
> > > >>> mentioned in [1].
> > > >>>
> > > >>> Now during optimization we proceed in top-down manner and we
> request
> > > >>> certain traitsets from the operators.
> > > >>> If it happens and they contain already the requested traits nothing
> > > needs
> > > >>> to be done.
> > > >>>
> > > >>> In your example when we are about to create the sort-merge join we
> > can
> > > >>> check what traitsets are present in the inputs
> > > >>> and if possible request those. Can you elaborate a bit more why do
> we
> > > need
> > > >>> a new type of metadata?
> > > >>>
> > > >>> Anyway if we cannot do it at the moment it makes sense to complete
> > the
> > > >>> missing bits since what you are describing
> > > >>> was already mentioned in the original design of the Volcano
> optimizer
> > > [1].
> > > >>>
> > > >>> "If a move to be pursued is the exploration of a normal query
> > > processing
> > > >>> algorithm such as merge-join, its cost is calculated by the
> > algorithm's
> > > >>> cost function. The algorithm's applicability function determines
> the
> > > >>> physical properly vectors for the algorithms inputs, and their
> costs
> > > and
> > > >>> optimal plans are found by invoking FindBestPlan for the inputs.
> For
> > > some
> > > >>> binary operators, the actual physical properties of the inputs are
> > not
> > > as
> > > >>> important as the consistency of physical properties among the
> inputs.
> > > For
> > > >>> example, for a sort-based implementation of intersection, i.e., an
> > > >>> algorithm very similar to merge-join, any sort order of the two
> > inputs
> > > will
> > > >>> suffice as long as the two inputs are sorted in the same way.
> > > Similarly,
> > > >>> for a parallel join, any partitioning of join inputs across
> multiple
> > > >>> processing nodes is acceptable if both inputs are partitioned using
> > > >>> Compatible partitioning rules. For these cases, the search engine
> > > permits
> > > >>> the optimizer implementor to specify a number of physical property
> > > vectors
> > > >>> to be tried. For example, for the intersection of two inputs R and
> S
> > > with
> > > >>> attributes A, B, and C where R is sorted on (A,B,C) and S is sorted
> > on
> > > >>> (B,A,C), both these sort orders can be specified by the optimizer
> > > >>> implementor and will be optimized by the generated optimizer, while
> > > other
> > > >>> possible sort orders, e.g., (C,B,A), will be ignored. " [1]
> > > >>>
> > > >>> Best,
> > > >>> Stamatis
> > > >>>
> > > >>> [1]
> > > >>>
> > >
> >
> https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
> > > >>>
> > > >>> On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <
> > h.yuan@alibaba-inc.com>
> > > >>> wrote:
> > > >>>
> > > >>>> TL;DR
> > > >>>> Both top-down physical TraitSet request and bottom-up TraitSet
> > > >>>> derivation have their strongth and weakness, we propose
> > > >>>> on-demand TraitSet request to combine the above two, to reduce
> > > >>>> the number of plan alternatives that are genereated, especially
> > > >>>> in distributed system.
> > > >>>>
> > > >>>> e.g.
> > > >>>> select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
> > > >>>>
> > > >>>> In non-distributed system, we can generate a sort merge join,
> > > >>>> requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.
> > > >>>> But if foo happens to be sorted by f3,f2,f1, we may miss the
> > > >>>> chance of making use of the delivered ordering of foo. Because
> > > >>>> if we require bar to be sorted by b3,b2,b1, we don't need to
> > > >>>> sort on foo anymore. There are so many choices, n!, not even
> > > >>>> considering asc/desc and null direction. We can't request all
> > > >>>> the possible traitsets in top-down way, and can't derive all the
> > > >>>> possible traitsets in bottom-up way either.
> > > >>>>
> > > >>>> We propose on-demand traitset request by adding a new type
> > > >>>> of metadata DerivedTraitSets into the built-in metadata system.
> > > >>>>
> > > >>>> List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)
> > > >>>>
> > > >>>> In this metadata, every operator returns several possbile
> traitsets
> > > >>>> that may be derived from this operator.
> > > >>>>
> > > >>>> Using above query as an example, the tablescan on foo should
> > > >>>> return traiset with collation on f3, f2, f1.
> > > >>>>
> > > >>>> In physical implementation rules, e.g. the SortMergeJoinRule,
> > > >>>> it gets possible traitsets from both child operators, uses the
> join
> > > >>>> keys to eliminate useless traitsets, leaves out usefull traitsets,
> > > >>>> and requests corresponding traitset on the other child.
> > > >>>>
> > > >>>> This relies on the feature of AbstractConverter, which is turned
> > > >>>> off by default, due to performance issue [1].
> > > >>>>
> > > >>>> Thoughts?
> > > >>>>
> > > >>>> [1] https://issues.apache.org/jira/browse/CALCITE-2970
> > > >>>>
> > > >>>> Haisheng
> > > >>>>
> > > >>>>
> > > >>>
> > >
> > >
> >
>
>

Re: Re: Re: Re: [DISCUSS] On-demand traitset request

Posted by Haisheng Yuan <h....@alibaba-inc.com>.

Yes, you are right, they are similar. But metadata framework can't provide the flexibility for the optimization engine to extend to other trait without modifying the core. Our goal should be making the optimization engine do all the work, physical operators just specify its behavior and characteristics.

In SQL Server, there is another physical property called Rewindability, which requires a physical operator to be rewindable. If we want to add this trait, as I said in last email [1], users just need to override the derive method to consider Rewindability:
public <T extends RelTrait> T derivedTrait(RelTraitDef<T> traitDef)

If using metadata, users have to define their own metadata for the added trait, like RelMdRewindability and modify the core to call the method, which is not ideal. But we can leverage the RelMdDistribution and RelMdCollation if applicable in derivedTrait methods. However, I don't agree to the design of RelMdCollation to return a list of collations. And I don't think the property requirement and derivation of a physical operator should be scattered to different places, which is a minor thing, though.


[1] http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/<f7...@alibaba-inc.com>
- Haisheng

On Fri, Oct 25, 2019 at 08:36：07 GMT Stamatis Zampetakis <za...@gmail.com> wrote:
I would like a further clarification regarding the methods:
derivedDistribution()
derivedCollation()

What would be the difference with the existing derivation mechanism in
RelMdDistribution [1], and RelMdCollation [2].
They are not sufficient to provide the necessary information?

[1]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMdDistribution.java
[2]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMdCollation.java


 ------------------Original Mail ------------------
Sender:Haisheng Yuan <h....@alibaba-inc.com>
Send Date:Fri Oct 25 12:01:40 2019
Recipients:dev@calcite.apache.org (dev@calcite.apache.org) <de...@calcite.apache.org>
Subject:Re: Re: Re: [DISCUSS] On-demand traitset request

I didn't say adding to RelNode, but a new API/interface for physical operator only.

What matters is not the number of interfaces, but the necessity of these methods.

- Haisheng

------------------------------------------------------------------
发件人：Danny Chan<yu...@gmail.com>
日　期：2019年10月25日 09:55:56
收件人：<de...@calcite.apache.org>
主　题：Re: Re: [DISCUSS] On-demand traitset request

I have the same feeling, it seems to much interfaces for the physical node(we do not really have physical class for physical nodes yet), so these interfaces may just be put into the RelNode, that was too complex and to much for me, can we have a way that do not modify the nodes itself ?

Best,
Danny Chan
在 2019年10月23日 +0800 PM2:53，Stamatis Zampetakis <za...@gmail.com>，写道：
> Overall, I agree that better encapsulation of propagation and derivation of
> traits would be beneficial for our system.
>
> Regarding the API proposed by Haisheng, I have to think a bit more on it.
> At first glance, adding such methods directly in the RelNode API does not
> appear an ideal solution since I don't see how easily it can be extended to
> support other kinds of traits.
>
> Best,
> Stamatis
>
> On Mon, Oct 21, 2019 at 7:31 AM Haisheng Yuan <h....@alibaba-inc.com>
> wrote:
>
> > To Stamatis,
> > Not exactly. My initial thought was giving the physical operator the
> > abiity to customize and fully control physical property derivation
> > strategy, thus can further help the purpose driven trait request. But since
> > we agree to think more high-level API to support on-demand traitset
> > request, I will illustrate what API is expected from implentator's
> > perspective.
> >
> > Jingfeng gave us basic steps on how the plan might be generated using
> > top-down purpose driven only manner, I think differently with the first
> > several steps.
> >
> > SELECT DISTINCT c, b FROM
> > ( SELECT R.c c, S.b b FROM R, S
> > WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> >
> > Aggregate . (c, b)
> > +--- MergeJoin . (a, b, c)
> > |--- TableScan on R
> > +-- TableScan on S
> >
> > 1. Aggreate require collation (c,b) from its child, not permutation.
> > 2. MergeJoin's parent require (c,b), it has 2 options. Pass it down, or
> > ignore it.
> > a) Pass down. it has join condition on (a,b,c), the required columns
> > can be coverd by join condition columns, so MergeJoin will try to deliver
> > (c,b,a), and both children must exact match. Then we will have sort on both
> > children of MergeJoin.
> > b) Ignore it. Require its first child collation on (a,b,c), but
> > matching type is subset. R delivers (c,b,a). Then using the first child's
> > derived collation trait to require its second child to exact match. Thus we
> > have a sort on S, and a sort on top of MergeJoin.
> >
> > Both plan might be good or bad. If R, S are large, but the join result is
> > small, plan b) might be better, otherwise plan a) might be better.
> >
> > Anyway, I hope the physical operators can have full control the physical
> > properties requests and derivation, in physical operator class itself, not
> > rules, not other places.
> >
> > Per our experience, we have spent too much time on writing code for
> > dealing with all kinds of property requirement and derivation. But in fact,
> > life should be easier. I would like to the physical operator provides the
> > following API, and the 3rd party implementator just need to
> > override/implement them, no more need to be taken care.
> >
> > 1. void setDistributionRequests(int numReq)
> > Each operator can specify how many optimzation requests on some trait it
> > want to do. e.g. HashJoin may request the following distribution on both
> > children:
> > - (hash distribution on key1, hash distribution on key1)
> > - (hash distribution on key2, hash distribution on key2)
> > - (hash distribution on all keys, hash distribution on all keys)
> > - (Any, Broadcast)
> > - (Gather, Gather)
> >
> > 2. RelDistribution requiredDistribution(RelDistribution required, int
> > child) //same for collation
> > Given the required distribution from parent operator, returns the required
> > distribution for its nth child.
> >
> > 3. RelDistribution derivedDistribution() //same for collation
> > Derive the distribution of the operator itelf from child operators.
> >
> > 4. MatchType distributionMatchType(int child) //same for collation
> > Returns the distribution match type for its nth child, how does it match
> > the other children.
> > Similar with Jinfeng's point, I think there should be 3 types of matching:
> > exact, satisfy, subset.
> > e.g.
> > R is distributed by (a), S is distributed by (a,b)
> > select * from R join S using a,b,c
> > If we have plan
> > HashJoin
> > |-- TableScan on R
> > +-- TableScan on S
> > We may require the match type on S to be satisfy. (a,b) satisfies required
> > distribution (a,b,c).
> > Fot the outer child R, we require it to be exact match with inner.
> >
> > 5. ExecOrder getExecOrder()
> > Returns how the operator's children is executed, left to right, or right
> > to left. Typically, hash join is right to left. We might use this as the
> > optimization order. To make sure we have correct plans, we have to optimize
> > child and enforce properties in the order that is specific to the physical
> > operator.
> > All the other dirty work should be done by the optimization engine, but
> > not through rules, I believe. However, I havn't got any clear plan on how
> > to achieve it inside the engine.
> >
> > Haisheng
> >
> > ------------------------------------------------------------------
> > 发件人：Jacques Nadeau<ja...@apache.org>
> > 日 期：2019年10月21日 11:04:19
> > 收件人：<de...@calcite.apache.org>
> > 主 题：Re: [DISCUSS] On-demand traitset request
> >
> > Definitely agree that this has been a long time missing. I've been
> > challenged by this absence since before Calcite was Calcite. I also
> > remember the trials and tribulations around this that Jinfeng references
> > above.
> >
> > In general, I think the first thing one might want to before actually doing
> > this is to make trait derivation internally defined based on the impact
> > that a rel node has on traits. I've always found the externally provided
> > rel traits to be problematic and a potential place for hidden bugs (row
> > type has the same problem) . It means that trait derivation of a relnode is
> > based on the rules that do transformation as opposed to the "physical"
> > impact of the relnode. (It also leads to derivation behavior for a relnode
> > being scattered in many different rules.) If moved to the rel node, it also
> > provides a second benefit, once you encapsulate this propagation logic, you
> > could also expose this as a trait derivation function that the planner
> > could use to seek out derivation paths.
> >
> > At Dremio we toyed last year with the idea of adding a heuristic cycle on
> > top of the existing volano planner and relset state. In this model a
> > RelNode would have two additional methods: it would expose a trait
> > propagation function (as described above) and optionally expose one or more
> > specific traits this node desired. When the planner arrived at a
> > conclusion, you'd run the heuristic cycle to further propagate desired
> > traits (if possible) and then restart the planning cycle based on any new
> > transformations done during the heuristic stage. You'd then repeat this
> > volcano/trait prop cycle until you arrive at a "completed" state.
> >
> > We never actually got to implementation but I'm super supportive of someone
> > picking this up.
> >
> >
> >
> > On Sat, Oct 19, 2019 at 12:25 AM Stamatis Zampetakis <za...@gmail.com>
> > wrote:
> >
> > > Thanks all for the very interesting usecases and helpful examples.
> > >
> > > I would like to stay a bit on the fact that logical operators do not have
> > > physical traits. Calcite's logical operators do have at least one
> > physical
> > > trait which is Convention.NONE. Other logical operators such as:
> > >
> > > LogicalTableScan [1]
> > > LogicalFilter [2]
> > > LogicalProject [3]
> > > LogicalWindow [4]
> > >
> > > have additional traits regarding collation and distribution. There is
> > > already some sort of trait derivation so to some extend it is possible to
> > > check the traitset of the child (logical) operator before requesting some
> > > other traitset when creating the parent (physical).
> > >
> > > I see that this mechanism of adding explicitly traits to logical
> > operators
> > > may be confusing and may also lead to planning problems. Replacing it by
> > > metadata might be a good idea and it is closer to the idea of
> > > "applicability function" mentioned in the Volcano paper. Assuming that we
> > > follow this approach I would assume that the traitset of logical
> > operators
> > > from now on should be always empty.
> > >
> > > Is this what you have in mind Haisheng?
> > >
> > > Best,
> > > Stamatis
> > >
> > > [1]
> > >
> > >
> > https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalTableScan.java#L95
> > > [2]
> > >
> > >
> > https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalFilter.java#L105
> > > [3]
> > >
> > >
> > https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalProject.java#L104
> > > [4]
> > >
> > >
> > https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalWindow.java#L95
> > >
> > > On Sat, Oct 19, 2019 at 7:39 AM Xiening Dai <xn...@gmail.com> wrote:
> > >
> > > > Thanks for the sharing. I like the way you model this problem, Jinfeng.
> > > >
> > > > There’s one minor issue with your example. Let say if R and S doesn’t
> > > have
> > > > sorting properties at all. In your case, we would end up adding
> > enforcers
> > > > for LHS and RHS to get collation (a, b, c). Then we would need another
> > > > enforcer to get collation (b, c). This is a sub optimal plan as we
> > could
> > > > have use (b, c, a) for join.
> > > >
> > > > I think in step #2, the join operator would need to take the agg trait
> > > > requirement into account. Then it would have two options -
> > > >
> > > > 1) require *exact/super* match of (b, c, a) or (c, b, a); this is to
> > > > guarantee the join output would deliver the collation agg needs.
> > > > 2) require permutation match of (a, b, c); in such case, an enforcer
> > > might
> > > > be needed for aggregation.
> > > >
> > > > Eventually the cost model decides who is the winner.
> > > >
> > > > There’s a fundamental difference between your model and Haisheng’s
> > > > proposal. In Haisheng’s case, a rel node not only looks at its parent’s
> > > > requirement, but also tries to get the potential traits its input could
> > > > deliver. It would try to align them to eliminate unnecessary
> > > alternatives.
> > > >
> > > > In above example, assuming R is (b, c, a) and S is (a, b, c), to
> > > implement
> > > > option 1), we would generate two alternatives -
> > > >
> > > > MergeJoin (b, c, a)
> > > > TableScan R
> > > > Sort(b, c, a)
> > > > TableScan S
> > > >
> > > > MergeJoin(c, b, a)
> > > > Sort(c, b, a)
> > > > TableScan R
> > > > Sort(c, b, a)
> > > > TableScan S
> > > >
> > > > But if we look at the input traits and has the insight that R already
> > > > delivers (b, c, a), we could decide to require (b, c, a) only and avoid
> > > > generating the 2nd plan, which is definitely worse, and reduce the
> > search
> > > > space.
> > > >
> > > >
> > > > > On Oct 18, 2019, at 4:57 PM, Jinfeng Ni <jn...@apache.org> wrote:
> > > > >
> > > > > A little bit of history. In Drill, when we first implemented
> > > > > Distribution trait's definition, we allows both exact match and
> > > > > partial match in satisfy() method. This works fine for single-input
> > > > > operator such aggregation, however it leads to incorrect plan for
> > join
> > > > > query, i.e LHS shuffle with (a, b), RHS shuffle with (a) . At that
> > > > > time, we removed partial match, and use exact match only. Yet this
> > > > > changes leads to unnecessary additional exchange. To mitigate this
> > > > > problem, in join physical operator, for a join key (a, b, c), we
> > > > > enumerate different distribution requests, yet this lead to more
> > space
> > > > > to explore and significantly increase planning time (which is
> > probably
> > > > > what Haisheng also experienced). When I look back, I feel probably
> > > > > what we miss is the "coordination" step in the join operator, because
> > > > > if we relax the requirement of satisfy(), for multi-input operators,
> > > > > we have to enforce some "coordination", to make sure multiple input's
> > > > > trait could work together properly.
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Oct 18, 2019 at 4:38 PM Jinfeng Ni <jn...@apache.org> wrote:
> > > > > >
> > > > > > This is an interesting topic. Thanks for bringing up this issue.
> > > > > >
> > > > > > My understanding of Volcano planner is it works in a top-down search
> > > > > > mode (the parent asks for certain trait of its child), while the
> > trait
> > > > > > propagates in a bottom-up way, as Stamatis explained.
> > > > > >
> > > > > > IMHO, the issue comes down to the definition of RelTrait, how to
> > > > > > determine if a trait A could satisfy a request asking for trait B,
> > > > > > that is, how RelTrait.satisfies() method is implemented.
> > > > > >
> > > > > > Let's first clarify different situations, using collation as
> > example.
> > > > > > 1) The collation is requested by query's outmost ORDER BY clause.
> > > > > > - The generated plan has to have "exact match", i.e same collation
> > > > > > (same column sequence), or "super match" .
> > > > > > exact match: (a, b) satisfy (a, b)
> > > > > > super match: (a, b, c) satisfy (a, b)
> > > > > >
> > > > > > 2) The collation is requested by operand with single input, such as
> > > > > > sort-based Aggregation.
> > > > > > - In such case, a "permutation match" is sufficient.
> > > > > > For instance, for Aggregation (b,c), input with collation (c, b)
> > > > > > could satisfy the requirement.
> > > > > > permutation match: (b, c) satisfy (c, b). (c, b) satisfy (c,
> > > b)
> > > > > > permutation match: (b, c, a) satisfy (c, b). (c, b, a) satisfy
> > > (c,
> > > > b)
> > > > > >
> > > > > > 3) The collation is requested by operand with >= 2 inputs, such as
> > > > > > sort-based MergeJoin.
> > > > > > - A permutation match is sufficient for each input
> > > > > > - MergeJoin has to do coordination, after input's trait propagates
> > > > > > upwards. In other words, ensure both inputs's permutation match are
> > > > > > actually same sequence. Otherwise, enforcer could be inserted upon
> > > > > > each input, and the planner generates two plans and let the cost
> > > > > > decide.
> > > > > >
> > > > > > For the first case, this is how today's RelCollation's satisfy()
> > > > > > method is implemented.
> > > > > >
> > > > > > For the second / third cases, use Haisheng's example,
> > > > > >
> > > > > > SELECT DISTINCT c, b FROM
> > > > > > ( SELECT R.c c, S.b b FROM R, S
> > > > > > WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> > > > > >
> > > > > > Aggregate . (c, b)
> > > > > > +--- MergeJoin . (a, b, c)
> > > > > > |--- TableScan on R
> > > > > > +--- TableScan on S
> > > > > >
> > > > > > Here is the steps that might take place in the planner:
> > > > > >
> > > > > > 1) Aggregate request permutation match collation (c, b)
> > > > > > 2) MergeJoin request a permutation match of (a, b,c) on both it's
> > > input
> > > > > > 3) R respond with collation (c, b, a), which satisfy MergeJoin's LHS
> > > > requirement
> > > > > > 4) S respond with collation (b, c, a), which satisfy MergeJoins' RHS
> > > > requirement
> > > > > > 5) MergeJoin do a coordination o LHS, RHS, and generate two possible
> > > > plans
> > > > > > MJ1: Insert a sort of (c, b, a) on RHS. This MJ operator now has
> > > > > > collation of (c, b, a)
> > > > > > MJ2: Insert a sort of (b, c, a) on LHS. This MJ operator now has
> > > > > > collation of (b, c, a)
> > > > > > 6) MJ1 and MJ2 could both satisfy permutation match request in step
> > > > > > 1, leading to two possible plans:
> > > > > > Agg1: with input of MJ1
> > > > > > Agg2: with input of MJ2
> > > > > > 7) planner chooses a best plan based on cost of Agg1 and Agg2.
> > > > > >
> > > > > > I should point that the enforcer sort inserted in step 5 could help
> > > > > > remove redundant sort in its input, if the input's collation is
> > > > > > obtained from sort, by invoking Calcite's SortRemove Rule.
> > > > > >
> > > > > > The above only considers the column sequence. The DESC/ASC, NULL
> > > > > > FIRST/LAST will add more complexity, but we probably use similar
> > idea.
> > > > > >
> > > > > > In summary, we need :
> > > > > > 1) redefine collation trait's satisfy() policy, exact match, super
> > > > > > match, permutation match,
> > > > > > 2) different physical operator applies different trait matching
> > > > > > policy, depending on operator's # of inputs, and algorithm
> > > > > > implementation.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Oct 18, 2019 at 2:51 PM Haisheng Yuan <
> > h.yuan@alibaba-inc.com
> > > >
> > > > wrote:
> > > > > > >
> > > > > > > Hi Stamatis,
> > > > > > >
> > > > > > > Thanks for your comment. I think my example didn't make it clear.
> > > > > > >
> > > > > > > When a logical operator is created, it doesn't have any physical,
> > > > > > > propertyand it shouldn't have. When a physical operator is created,
> > > > > > > e.g. in Enumerable convention, it only creates an intuitive
> > traitset
> > > > > > > with it, and requests it children the corresponding ones.
> > > > > > >
> > > > > > > For operators such as Join, Aggregate, Window, which may deliver
> > > > > > > multiple different traitsets, when the parent operator is created
> > and
> > > > > > > request its traitset, it might be good to know what are the
> > poosible
> > > > > > > traitset that the child operator can deliver. e.g.
> > > > > > >
> > > > > > > SELECT DISTINCT c, b FROM
> > > > > > > ( SELECT R.c c, S.b b FROM R, S
> > > > > > > WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> > > > > > >
> > > > > > > Suppose R is ordered by (c, b, a), and S is ordered by (b, c, a).
> > > > > > > Here is the logical plan:
> > > > > > > Aggregate
> > > > > > > +--- InnerJoin
> > > > > > > |--- TableScan on R
> > > > > > > +--- TableScan on S
> > > > > > >
> > > > > > > When we create a physical merge join for the inner join, it may
> > just
> > > > > > > have collation sorted on a,b,c. Then the aggreate on top of join
> > will
> > > > > > > request another sort on c,b, thus we miss the best plan. What we
> > > > > > > can do is requesting all the order combinations, which is n!, like
> > > > > > > how the Values operator does. But that is too much.
> > > > > > >
> > > > > > > If we can provide an approach that can minimize the possiple
> > traitset
> > > > > > > that the child operator may deliver, we can reduce the chance of
> > > > missing
> > > > > > > good plans. For the above query, the Aggregate operator can derive
> > > > > > > possible traitsets that its child operator join can deliver, in
> > which
> > > > case,
> > > > > > > the possiple traitsets of join is
> > > > > > > 1. collation on (a,b,c) based on join condition,
> > > > > > > 2. collation on (c,b,a) based on left child,
> > > > > > > 3. collation on (b,c,a) based on right child
> > > > > > > So we can request Aggregate sorted by (c,b) and Join sorted by
> > > (c,b,a).
> > > > > > > The number of traiset requests and plan alternatives can be
> > reduced.
> > > > > > > The DerivedTraitSets can be used to derive the possible traitsets
> > > from
> > > > > > > Join, and pass through Project, Filter etc...
> > > > > > >
> > > > > > > This is just an example of non-distributed system, for distributed
> > > > system,
> > > > > > > it can save much more by considering the possible distribution
> > > > delivered
> > > > > > > by child operators.
> > > > > > >
> > > > > > > One thing that concerns me is it highly relies on the traiset
> > system
> > > > of the
> > > > > > > underlying physical system. Like Enumerable doesn't consider
> > > > distribution,
> > > > > > > because it is single-node system, but Hive/Flink are distributed
> > > > system.
> > > > > > > - Haisheng
> > > > > > >
> > > > > > > ------------------------------------------------------------------
> > > > > > > 发件人：Stamatis Zampetakis<za...@gmail.com>
> > > > > > > 日 期：2019年10月18日 14:53:41
> > > > > > > 收件人：<de...@calcite.apache.org>
> > > > > > > 主 题：Re: [DISCUSS] On-demand traitset request
> > > > > > >
> > > > > > > Hi Haisheng,
> > > > > > >
> > > > > > > This is an interesting topic but somehow in my mind I thought that
> > > this
> > > > > > > mechanism is already in place.
> > > > > > >
> > > > > > > When an operator (logical or physical) is created its traitset is
> > > > > > > determined in bottom-up fashion using the create
> > > > > > > static factory method present in almost all operators. In my mind
> > > this
> > > > is
> > > > > > > in some sense the applicability function
> > > > > > > mentioned in [1].
> > > > > > >
> > > > > > > Now during optimization we proceed in top-down manner and we
> > request
> > > > > > > certain traitsets from the operators.
> > > > > > > If it happens and they contain already the requested traits nothing
> > > > needs
> > > > > > > to be done.
> > > > > > >
> > > > > > > In your example when we are about to create the sort-merge join we
> > > can
> > > > > > > check what traitsets are present in the inputs
> > > > > > > and if possible request those. Can you elaborate a bit more why do
> > we
> > > > need
> > > > > > > a new type of metadata?
> > > > > > >
> > > > > > > Anyway if we cannot do it at the moment it makes sense to complete
> > > the
> > > > > > > missing bits since what you are describing
> > > > > > > was already mentioned in the original design of the Volcano
> > optimizer
> > > > [1].
> > > > > > >
> > > > > > > "If a move to be pursued is the exploration of a normal query
> > > > processing
> > > > > > > algorithm such as merge-join, its cost is calculated by the
> > > algorithm's
> > > > > > > cost function. The algorithm's applicability function determines
> > the
> > > > > > > physical properly vectors for the algorithms inputs, and their
> > costs
> > > > and
> > > > > > > optimal plans are found by invoking FindBestPlan for the inputs.
> > For
> > > > some
> > > > > > > binary operators, the actual physical properties of the inputs are
> > > not
> > > > as
> > > > > > > important as the consistency of physical properties among the
> > inputs.
> > > > For
> > > > > > > example, for a sort-based implementation of intersection, i.e., an
> > > > > > > algorithm very similar to merge-join, any sort order of the two
> > > inputs
> > > > will
> > > > > > > suffice as long as the two inputs are sorted in the same way.
> > > > Similarly,
> > > > > > > for a parallel join, any partitioning of join inputs across
> > multiple
> > > > > > > processing nodes is acceptable if both inputs are partitioned using
> > > > > > > Compatible partitioning rules. For these cases, the search engine
> > > > permits
> > > > > > > the optimizer implementor to specify a number of physical property
> > > > vectors
> > > > > > > to be tried. For example, for the intersection of two inputs R and
> > S
> > > > with
> > > > > > > attributes A, B, and C where R is sorted on (A,B,C) and S is sorted
> > > on
> > > > > > > (B,A,C), both these sort orders can be specified by the optimizer
> > > > > > > implementor and will be optimized by the generated optimizer, while
> > > > other
> > > > > > > possible sort orders, e.g., (C,B,A), will be ignored. " [1]
> > > > > > >
> > > > > > > Best,
> > > > > > > Stamatis
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > >
> > >
> > https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
> > > > > > >
> > > > > > > On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <
> > > h.yuan@alibaba-inc.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > TL;DR
> > > > > > > > Both top-down physical TraitSet request and bottom-up TraitSet
> > > > > > > > derivation have their strongth and weakness, we propose
> > > > > > > > on-demand TraitSet request to combine the above two, to reduce
> > > > > > > > the number of plan alternatives that are genereated, especially
> > > > > > > > in distributed system.
> > > > > > > >
> > > > > > > > e.g.
> > > > > > > > select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
> > > > > > > >
> > > > > > > > In non-distributed system, we can generate a sort merge join,
> > > > > > > > requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.
> > > > > > > > But if foo happens to be sorted by f3,f2,f1, we may miss the
> > > > > > > > chance of making use of the delivered ordering of foo. Because
> > > > > > > > if we require bar to be sorted by b3,b2,b1, we don't need to
> > > > > > > > sort on foo anymore. There are so many choices, n!, not even
> > > > > > > > considering asc/desc and null direction. We can't request all
> > > > > > > > the possible traitsets in top-down way, and can't derive all the
> > > > > > > > possible traitsets in bottom-up way either.
> > > > > > > >
> > > > > > > > We propose on-demand traitset request by adding a new type
> > > > > > > > of metadata DerivedTraitSets into the built-in metadata system.
> > > > > > > >
> > > > > > > > List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)
> > > > > > > >
> > > > > > > > In this metadata, every operator returns several possbile
> > traitsets
> > > > > > > > that may be derived from this operator.
> > > > > > > >
> > > > > > > > Using above query as an example, the tablescan on foo should
> > > > > > > > return traiset with collation on f3, f2, f1.
> > > > > > > >
> > > > > > > > In physical implementation rules, e.g. the SortMergeJoinRule,
> > > > > > > > it gets possible traitsets from both child operators, uses the
> > join
> > > > > > > > keys to eliminate useless traitsets, leaves out usefull traitsets,
> > > > > > > > and requests corresponding traitset on the other child.
> > > > > > > >
> > > > > > > > This relies on the feature of AbstractConverter, which is turned
> > > > > > > > off by default, due to performance issue [1].
> > > > > > > >
> > > > > > > > Thoughts?
> > > > > > > >
> > > > > > > > [1] https://issues.apache.org/jira/browse/CALCITE-2970
> > > > > > > >
> > > > > > > > Haisheng
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > >
> > > >
> > >
> >
> >

Re: Re: Re: [DISCUSS] On-demand traitset request

Posted by Haisheng Yuan <h....@alibaba-inc.com>.

I didn't say adding to RelNode, but a new API/interface for physical operator only.

What matters is not the number of interfaces, but the necessity of these methods.

- Haisheng

------------------------------------------------------------------
发件人：Danny Chan<yu...@gmail.com>
日　期：2019年10月25日 09:55:56
收件人：<de...@calcite.apache.org>
主　题：Re: Re: [DISCUSS] On-demand traitset request

I have the same feeling, it seems to much interfaces for the physical node(we do not really have physical class for physical nodes yet), so these interfaces may just be put into the RelNode, that was too complex and to much for me, can we have a way that do not modify the nodes itself ?

Best,
Danny Chan
在 2019年10月23日 +0800 PM2:53，Stamatis Zampetakis <za...@gmail.com>，写道：
> Overall, I agree that better encapsulation of propagation and derivation of
> traits would be beneficial for our system.
>
> Regarding the API proposed by Haisheng, I have to think a bit more on it.
> At first glance, adding such methods directly in the RelNode API does not
> appear an ideal solution since I don't see how easily it can be extended to
> support other kinds of traits.
>
> Best,
> Stamatis
>
> On Mon, Oct 21, 2019 at 7:31 AM Haisheng Yuan <h....@alibaba-inc.com>
> wrote:
>
> > To Stamatis,
> > Not exactly. My initial thought was giving the physical operator the
> > abiity to customize and fully control physical property derivation
> > strategy, thus can further help the purpose driven trait request. But since
> > we agree to think more high-level API to support on-demand traitset
> > request, I will illustrate what API is expected from implentator's
> > perspective.
> >
> > Jingfeng gave us basic steps on how the plan might be generated using
> > top-down purpose driven only manner, I think differently with the first
> > several steps.
> >
> > SELECT DISTINCT c, b FROM
> > ( SELECT R.c c, S.b b FROM R, S
> > WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> >
> > Aggregate . (c, b)
> > +--- MergeJoin . (a, b, c)
> > |--- TableScan on R
> > +-- TableScan on S
> >
> > 1. Aggreate require collation (c,b) from its child, not permutation.
> > 2. MergeJoin's parent require (c,b), it has 2 options. Pass it down, or
> > ignore it.
> > a) Pass down. it has join condition on (a,b,c), the required columns
> > can be coverd by join condition columns, so MergeJoin will try to deliver
> > (c,b,a), and both children must exact match. Then we will have sort on both
> > children of MergeJoin.
> > b) Ignore it. Require its first child collation on (a,b,c), but
> > matching type is subset. R delivers (c,b,a). Then using the first child's
> > derived collation trait to require its second child to exact match. Thus we
> > have a sort on S, and a sort on top of MergeJoin.
> >
> > Both plan might be good or bad. If R, S are large, but the join result is
> > small, plan b) might be better, otherwise plan a) might be better.
> >
> > Anyway, I hope the physical operators can have full control the physical
> > properties requests and derivation, in physical operator class itself, not
> > rules, not other places.
> >
> > Per our experience, we have spent too much time on writing code for
> > dealing with all kinds of property requirement and derivation. But in fact,
> > life should be easier. I would like to the physical operator provides the
> > following API, and the 3rd party implementator just need to
> > override/implement them, no more need to be taken care.
> >
> > 1. void setDistributionRequests(int numReq)
> > Each operator can specify how many optimzation requests on some trait it
> > want to do. e.g. HashJoin may request the following distribution on both
> > children:
> > - (hash distribution on key1, hash distribution on key1)
> > - (hash distribution on key2, hash distribution on key2)
> > - (hash distribution on all keys, hash distribution on all keys)
> > - (Any, Broadcast)
> > - (Gather, Gather)
> >
> > 2. RelDistribution requiredDistribution(RelDistribution required, int
> > child) //same for collation
> > Given the required distribution from parent operator, returns the required
> > distribution for its nth child.
> >
> > 3. RelDistribution derivedDistribution() //same for collation
> > Derive the distribution of the operator itelf from child operators.
> >
> > 4. MatchType distributionMatchType(int child) //same for collation
> > Returns the distribution match type for its nth child, how does it match
> > the other children.
> > Similar with Jinfeng's point, I think there should be 3 types of matching:
> > exact, satisfy, subset.
> > e.g.
> > R is distributed by (a), S is distributed by (a,b)
> > select * from R join S using a,b,c
> > If we have plan
> > HashJoin
> > |-- TableScan on R
> > +-- TableScan on S
> > We may require the match type on S to be satisfy. (a,b) satisfies required
> > distribution (a,b,c).
> > Fot the outer child R, we require it to be exact match with inner.
> >
> > 5. ExecOrder getExecOrder()
> > Returns how the operator's children is executed, left to right, or right
> > to left. Typically, hash join is right to left. We might use this as the
> > optimization order. To make sure we have correct plans, we have to optimize
> > child and enforce properties in the order that is specific to the physical
> > operator.
> > All the other dirty work should be done by the optimization engine, but
> > not through rules, I believe. However, I havn't got any clear plan on how
> > to achieve it inside the engine.
> >
> > Haisheng
> >
> > ------------------------------------------------------------------
> > 发件人：Jacques Nadeau<ja...@apache.org>
> > 日 期：2019年10月21日 11:04:19
> > 收件人：<de...@calcite.apache.org>
> > 主 题：Re: [DISCUSS] On-demand traitset request
> >
> > Definitely agree that this has been a long time missing. I've been
> > challenged by this absence since before Calcite was Calcite. I also
> > remember the trials and tribulations around this that Jinfeng references
> > above.
> >
> > In general, I think the first thing one might want to before actually doing
> > this is to make trait derivation internally defined based on the impact
> > that a rel node has on traits. I've always found the externally provided
> > rel traits to be problematic and a potential place for hidden bugs (row
> > type has the same problem) . It means that trait derivation of a relnode is
> > based on the rules that do transformation as opposed to the "physical"
> > impact of the relnode. (It also leads to derivation behavior for a relnode
> > being scattered in many different rules.) If moved to the rel node, it also
> > provides a second benefit, once you encapsulate this propagation logic, you
> > could also expose this as a trait derivation function that the planner
> > could use to seek out derivation paths.
> >
> > At Dremio we toyed last year with the idea of adding a heuristic cycle on
> > top of the existing volano planner and relset state. In this model a
> > RelNode would have two additional methods: it would expose a trait
> > propagation function (as described above) and optionally expose one or more
> > specific traits this node desired. When the planner arrived at a
> > conclusion, you'd run the heuristic cycle to further propagate desired
> > traits (if possible) and then restart the planning cycle based on any new
> > transformations done during the heuristic stage. You'd then repeat this
> > volcano/trait prop cycle until you arrive at a "completed" state.
> >
> > We never actually got to implementation but I'm super supportive of someone
> > picking this up.
> >
> >
> >
> > On Sat, Oct 19, 2019 at 12:25 AM Stamatis Zampetakis <za...@gmail.com>
> > wrote:
> >
> > > Thanks all for the very interesting usecases and helpful examples.
> > >
> > > I would like to stay a bit on the fact that logical operators do not have
> > > physical traits. Calcite's logical operators do have at least one
> > physical
> > > trait which is Convention.NONE. Other logical operators such as:
> > >
> > > LogicalTableScan [1]
> > > LogicalFilter [2]
> > > LogicalProject [3]
> > > LogicalWindow [4]
> > >
> > > have additional traits regarding collation and distribution. There is
> > > already some sort of trait derivation so to some extend it is possible to
> > > check the traitset of the child (logical) operator before requesting some
> > > other traitset when creating the parent (physical).
> > >
> > > I see that this mechanism of adding explicitly traits to logical
> > operators
> > > may be confusing and may also lead to planning problems. Replacing it by
> > > metadata might be a good idea and it is closer to the idea of
> > > "applicability function" mentioned in the Volcano paper. Assuming that we
> > > follow this approach I would assume that the traitset of logical
> > operators
> > > from now on should be always empty.
> > >
> > > Is this what you have in mind Haisheng?
> > >
> > > Best,
> > > Stamatis
> > >
> > > [1]
> > >
> > >
> > https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalTableScan.java#L95
> > > [2]
> > >
> > >
> > https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalFilter.java#L105
> > > [3]
> > >
> > >
> > https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalProject.java#L104
> > > [4]
> > >
> > >
> > https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalWindow.java#L95
> > >
> > > On Sat, Oct 19, 2019 at 7:39 AM Xiening Dai <xn...@gmail.com> wrote:
> > >
> > > > Thanks for the sharing. I like the way you model this problem, Jinfeng.
> > > >
> > > > There’s one minor issue with your example. Let say if R and S doesn’t
> > > have
> > > > sorting properties at all. In your case, we would end up adding
> > enforcers
> > > > for LHS and RHS to get collation (a, b, c). Then we would need another
> > > > enforcer to get collation (b, c). This is a sub optimal plan as we
> > could
> > > > have use (b, c, a) for join.
> > > >
> > > > I think in step #2, the join operator would need to take the agg trait
> > > > requirement into account. Then it would have two options -
> > > >
> > > > 1) require *exact/super* match of (b, c, a) or (c, b, a); this is to
> > > > guarantee the join output would deliver the collation agg needs.
> > > > 2) require permutation match of (a, b, c); in such case, an enforcer
> > > might
> > > > be needed for aggregation.
> > > >
> > > > Eventually the cost model decides who is the winner.
> > > >
> > > > There’s a fundamental difference between your model and Haisheng’s
> > > > proposal. In Haisheng’s case, a rel node not only looks at its parent’s
> > > > requirement, but also tries to get the potential traits its input could
> > > > deliver. It would try to align them to eliminate unnecessary
> > > alternatives.
> > > >
> > > > In above example, assuming R is (b, c, a) and S is (a, b, c), to
> > > implement
> > > > option 1), we would generate two alternatives -
> > > >
> > > > MergeJoin (b, c, a)
> > > > TableScan R
> > > > Sort(b, c, a)
> > > > TableScan S
> > > >
> > > > MergeJoin(c, b, a)
> > > > Sort(c, b, a)
> > > > TableScan R
> > > > Sort(c, b, a)
> > > > TableScan S
> > > >
> > > > But if we look at the input traits and has the insight that R already
> > > > delivers (b, c, a), we could decide to require (b, c, a) only and avoid
> > > > generating the 2nd plan, which is definitely worse, and reduce the
> > search
> > > > space.
> > > >
> > > >
> > > > > On Oct 18, 2019, at 4:57 PM, Jinfeng Ni <jn...@apache.org> wrote:
> > > > >
> > > > > A little bit of history. In Drill, when we first implemented
> > > > > Distribution trait's definition, we allows both exact match and
> > > > > partial match in satisfy() method. This works fine for single-input
> > > > > operator such aggregation, however it leads to incorrect plan for
> > join
> > > > > query, i.e LHS shuffle with (a, b), RHS shuffle with (a) . At that
> > > > > time, we removed partial match, and use exact match only. Yet this
> > > > > changes leads to unnecessary additional exchange. To mitigate this
> > > > > problem, in join physical operator, for a join key (a, b, c), we
> > > > > enumerate different distribution requests, yet this lead to more
> > space
> > > > > to explore and significantly increase planning time (which is
> > probably
> > > > > what Haisheng also experienced). When I look back, I feel probably
> > > > > what we miss is the "coordination" step in the join operator, because
> > > > > if we relax the requirement of satisfy(), for multi-input operators,
> > > > > we have to enforce some "coordination", to make sure multiple input's
> > > > > trait could work together properly.
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Oct 18, 2019 at 4:38 PM Jinfeng Ni <jn...@apache.org> wrote:
> > > > > >
> > > > > > This is an interesting topic. Thanks for bringing up this issue.
> > > > > >
> > > > > > My understanding of Volcano planner is it works in a top-down search
> > > > > > mode (the parent asks for certain trait of its child), while the
> > trait
> > > > > > propagates in a bottom-up way, as Stamatis explained.
> > > > > >
> > > > > > IMHO, the issue comes down to the definition of RelTrait, how to
> > > > > > determine if a trait A could satisfy a request asking for trait B,
> > > > > > that is, how RelTrait.satisfies() method is implemented.
> > > > > >
> > > > > > Let's first clarify different situations, using collation as
> > example.
> > > > > > 1) The collation is requested by query's outmost ORDER BY clause.
> > > > > > - The generated plan has to have "exact match", i.e same collation
> > > > > > (same column sequence), or "super match" .
> > > > > > exact match: (a, b) satisfy (a, b)
> > > > > > super match: (a, b, c) satisfy (a, b)
> > > > > >
> > > > > > 2) The collation is requested by operand with single input, such as
> > > > > > sort-based Aggregation.
> > > > > > - In such case, a "permutation match" is sufficient.
> > > > > > For instance, for Aggregation (b,c), input with collation (c, b)
> > > > > > could satisfy the requirement.
> > > > > > permutation match: (b, c) satisfy (c, b). (c, b) satisfy (c,
> > > b)
> > > > > > permutation match: (b, c, a) satisfy (c, b). (c, b, a) satisfy
> > > (c,
> > > > b)
> > > > > >
> > > > > > 3) The collation is requested by operand with >= 2 inputs, such as
> > > > > > sort-based MergeJoin.
> > > > > > - A permutation match is sufficient for each input
> > > > > > - MergeJoin has to do coordination, after input's trait propagates
> > > > > > upwards. In other words, ensure both inputs's permutation match are
> > > > > > actually same sequence. Otherwise, enforcer could be inserted upon
> > > > > > each input, and the planner generates two plans and let the cost
> > > > > > decide.
> > > > > >
> > > > > > For the first case, this is how today's RelCollation's satisfy()
> > > > > > method is implemented.
> > > > > >
> > > > > > For the second / third cases, use Haisheng's example,
> > > > > >
> > > > > > SELECT DISTINCT c, b FROM
> > > > > > ( SELECT R.c c, S.b b FROM R, S
> > > > > > WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> > > > > >
> > > > > > Aggregate . (c, b)
> > > > > > +--- MergeJoin . (a, b, c)
> > > > > > |--- TableScan on R
> > > > > > +--- TableScan on S
> > > > > >
> > > > > > Here is the steps that might take place in the planner:
> > > > > >
> > > > > > 1) Aggregate request permutation match collation (c, b)
> > > > > > 2) MergeJoin request a permutation match of (a, b,c) on both it's
> > > input
> > > > > > 3) R respond with collation (c, b, a), which satisfy MergeJoin's LHS
> > > > requirement
> > > > > > 4) S respond with collation (b, c, a), which satisfy MergeJoins' RHS
> > > > requirement
> > > > > > 5) MergeJoin do a coordination o LHS, RHS, and generate two possible
> > > > plans
> > > > > > MJ1: Insert a sort of (c, b, a) on RHS. This MJ operator now has
> > > > > > collation of (c, b, a)
> > > > > > MJ2: Insert a sort of (b, c, a) on LHS. This MJ operator now has
> > > > > > collation of (b, c, a)
> > > > > > 6) MJ1 and MJ2 could both satisfy permutation match request in step
> > > > > > 1, leading to two possible plans:
> > > > > > Agg1: with input of MJ1
> > > > > > Agg2: with input of MJ2
> > > > > > 7) planner chooses a best plan based on cost of Agg1 and Agg2.
> > > > > >
> > > > > > I should point that the enforcer sort inserted in step 5 could help
> > > > > > remove redundant sort in its input, if the input's collation is
> > > > > > obtained from sort, by invoking Calcite's SortRemove Rule.
> > > > > >
> > > > > > The above only considers the column sequence. The DESC/ASC, NULL
> > > > > > FIRST/LAST will add more complexity, but we probably use similar
> > idea.
> > > > > >
> > > > > > In summary, we need :
> > > > > > 1) redefine collation trait's satisfy() policy, exact match, super
> > > > > > match, permutation match,
> > > > > > 2) different physical operator applies different trait matching
> > > > > > policy, depending on operator's # of inputs, and algorithm
> > > > > > implementation.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Oct 18, 2019 at 2:51 PM Haisheng Yuan <
> > h.yuan@alibaba-inc.com
> > > >
> > > > wrote:
> > > > > > >
> > > > > > > Hi Stamatis,
> > > > > > >
> > > > > > > Thanks for your comment. I think my example didn't make it clear.
> > > > > > >
> > > > > > > When a logical operator is created, it doesn't have any physical,
> > > > > > > propertyand it shouldn't have. When a physical operator is created,
> > > > > > > e.g. in Enumerable convention, it only creates an intuitive
> > traitset
> > > > > > > with it, and requests it children the corresponding ones.
> > > > > > >
> > > > > > > For operators such as Join, Aggregate, Window, which may deliver
> > > > > > > multiple different traitsets, when the parent operator is created
> > and
> > > > > > > request its traitset, it might be good to know what are the
> > poosible
> > > > > > > traitset that the child operator can deliver. e.g.
> > > > > > >
> > > > > > > SELECT DISTINCT c, b FROM
> > > > > > > ( SELECT R.c c, S.b b FROM R, S
> > > > > > > WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> > > > > > >
> > > > > > > Suppose R is ordered by (c, b, a), and S is ordered by (b, c, a).
> > > > > > > Here is the logical plan:
> > > > > > > Aggregate
> > > > > > > +--- InnerJoin
> > > > > > > |--- TableScan on R
> > > > > > > +--- TableScan on S
> > > > > > >
> > > > > > > When we create a physical merge join for the inner join, it may
> > just
> > > > > > > have collation sorted on a,b,c. Then the aggreate on top of join
> > will
> > > > > > > request another sort on c,b, thus we miss the best plan. What we
> > > > > > > can do is requesting all the order combinations, which is n!, like
> > > > > > > how the Values operator does. But that is too much.
> > > > > > >
> > > > > > > If we can provide an approach that can minimize the possiple
> > traitset
> > > > > > > that the child operator may deliver, we can reduce the chance of
> > > > missing
> > > > > > > good plans. For the above query, the Aggregate operator can derive
> > > > > > > possible traitsets that its child operator join can deliver, in
> > which
> > > > case,
> > > > > > > the possiple traitsets of join is
> > > > > > > 1. collation on (a,b,c) based on join condition,
> > > > > > > 2. collation on (c,b,a) based on left child,
> > > > > > > 3. collation on (b,c,a) based on right child
> > > > > > > So we can request Aggregate sorted by (c,b) and Join sorted by
> > > (c,b,a).
> > > > > > > The number of traiset requests and plan alternatives can be
> > reduced.
> > > > > > > The DerivedTraitSets can be used to derive the possible traitsets
> > > from
> > > > > > > Join, and pass through Project, Filter etc...
> > > > > > >
> > > > > > > This is just an example of non-distributed system, for distributed
> > > > system,
> > > > > > > it can save much more by considering the possible distribution
> > > > delivered
> > > > > > > by child operators.
> > > > > > >
> > > > > > > One thing that concerns me is it highly relies on the traiset
> > system
> > > > of the
> > > > > > > underlying physical system. Like Enumerable doesn't consider
> > > > distribution,
> > > > > > > because it is single-node system, but Hive/Flink are distributed
> > > > system.
> > > > > > > - Haisheng
> > > > > > >
> > > > > > > ------------------------------------------------------------------
> > > > > > > 发件人：Stamatis Zampetakis<za...@gmail.com>
> > > > > > > 日 期：2019年10月18日 14:53:41
> > > > > > > 收件人：<de...@calcite.apache.org>
> > > > > > > 主 题：Re: [DISCUSS] On-demand traitset request
> > > > > > >
> > > > > > > Hi Haisheng,
> > > > > > >
> > > > > > > This is an interesting topic but somehow in my mind I thought that
> > > this
> > > > > > > mechanism is already in place.
> > > > > > >
> > > > > > > When an operator (logical or physical) is created its traitset is
> > > > > > > determined in bottom-up fashion using the create
> > > > > > > static factory method present in almost all operators. In my mind
> > > this
> > > > is
> > > > > > > in some sense the applicability function
> > > > > > > mentioned in [1].
> > > > > > >
> > > > > > > Now during optimization we proceed in top-down manner and we
> > request
> > > > > > > certain traitsets from the operators.
> > > > > > > If it happens and they contain already the requested traits nothing
> > > > needs
> > > > > > > to be done.
> > > > > > >
> > > > > > > In your example when we are about to create the sort-merge join we
> > > can
> > > > > > > check what traitsets are present in the inputs
> > > > > > > and if possible request those. Can you elaborate a bit more why do
> > we
> > > > need
> > > > > > > a new type of metadata?
> > > > > > >
> > > > > > > Anyway if we cannot do it at the moment it makes sense to complete
> > > the
> > > > > > > missing bits since what you are describing
> > > > > > > was already mentioned in the original design of the Volcano
> > optimizer
> > > > [1].
> > > > > > >
> > > > > > > "If a move to be pursued is the exploration of a normal query
> > > > processing
> > > > > > > algorithm such as merge-join, its cost is calculated by the
> > > algorithm's
> > > > > > > cost function. The algorithm's applicability function determines
> > the
> > > > > > > physical properly vectors for the algorithms inputs, and their
> > costs
> > > > and
> > > > > > > optimal plans are found by invoking FindBestPlan for the inputs.
> > For
> > > > some
> > > > > > > binary operators, the actual physical properties of the inputs are
> > > not
> > > > as
> > > > > > > important as the consistency of physical properties among the
> > inputs.
> > > > For
> > > > > > > example, for a sort-based implementation of intersection, i.e., an
> > > > > > > algorithm very similar to merge-join, any sort order of the two
> > > inputs
> > > > will
> > > > > > > suffice as long as the two inputs are sorted in the same way.
> > > > Similarly,
> > > > > > > for a parallel join, any partitioning of join inputs across
> > multiple
> > > > > > > processing nodes is acceptable if both inputs are partitioned using
> > > > > > > Compatible partitioning rules. For these cases, the search engine
> > > > permits
> > > > > > > the optimizer implementor to specify a number of physical property
> > > > vectors
> > > > > > > to be tried. For example, for the intersection of two inputs R and
> > S
> > > > with
> > > > > > > attributes A, B, and C where R is sorted on (A,B,C) and S is sorted
> > > on
> > > > > > > (B,A,C), both these sort orders can be specified by the optimizer
> > > > > > > implementor and will be optimized by the generated optimizer, while
> > > > other
> > > > > > > possible sort orders, e.g., (C,B,A), will be ignored. " [1]
> > > > > > >
> > > > > > > Best,
> > > > > > > Stamatis
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > >
> > >
> > https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
> > > > > > >
> > > > > > > On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <
> > > h.yuan@alibaba-inc.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > TL;DR
> > > > > > > > Both top-down physical TraitSet request and bottom-up TraitSet
> > > > > > > > derivation have their strongth and weakness, we propose
> > > > > > > > on-demand TraitSet request to combine the above two, to reduce
> > > > > > > > the number of plan alternatives that are genereated, especially
> > > > > > > > in distributed system.
> > > > > > > >
> > > > > > > > e.g.
> > > > > > > > select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
> > > > > > > >
> > > > > > > > In non-distributed system, we can generate a sort merge join,
> > > > > > > > requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.
> > > > > > > > But if foo happens to be sorted by f3,f2,f1, we may miss the
> > > > > > > > chance of making use of the delivered ordering of foo. Because
> > > > > > > > if we require bar to be sorted by b3,b2,b1, we don't need to
> > > > > > > > sort on foo anymore. There are so many choices, n!, not even
> > > > > > > > considering asc/desc and null direction. We can't request all
> > > > > > > > the possible traitsets in top-down way, and can't derive all the
> > > > > > > > possible traitsets in bottom-up way either.
> > > > > > > >
> > > > > > > > We propose on-demand traitset request by adding a new type
> > > > > > > > of metadata DerivedTraitSets into the built-in metadata system.
> > > > > > > >
> > > > > > > > List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)
> > > > > > > >
> > > > > > > > In this metadata, every operator returns several possbile
> > traitsets
> > > > > > > > that may be derived from this operator.
> > > > > > > >
> > > > > > > > Using above query as an example, the tablescan on foo should
> > > > > > > > return traiset with collation on f3, f2, f1.
> > > > > > > >
> > > > > > > > In physical implementation rules, e.g. the SortMergeJoinRule,
> > > > > > > > it gets possible traitsets from both child operators, uses the
> > join
> > > > > > > > keys to eliminate useless traitsets, leaves out usefull traitsets,
> > > > > > > > and requests corresponding traitset on the other child.
> > > > > > > >
> > > > > > > > This relies on the feature of AbstractConverter, which is turned
> > > > > > > > off by default, due to performance issue [1].
> > > > > > > >
> > > > > > > > Thoughts?
> > > > > > > >
> > > > > > > > [1] https://issues.apache.org/jira/browse/CALCITE-2970
> > > > > > > >
> > > > > > > > Haisheng
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > >
> > > >
> > >
> >
> >

Re: Re: [DISCUSS] On-demand traitset request

Posted by Stamatis Zampetakis <za...@gmail.com>.

I would like a further clarification regarding the methods:
derivedDistribution()
derivedCollation()

What would be the difference with the existing derivation mechanism in
RelMdDistribution [1], and RelMdCollation [2].
They are not sufficient to provide the necessary information?

[1]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMdDistribution.java
[2]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMdCollation.java


On Fri, Oct 25, 2019 at 3:56 AM Danny Chan <yu...@gmail.com> wrote:

> I have the same feeling, it seems to much interfaces for the physical
> node(we do not really have physical class for physical nodes yet), so these
> interfaces may just be put into the RelNode, that was too complex and to
> much for me, can we have a way that do not modify the nodes itself ?
>
> Best,
> Danny Chan
> 在 2019年10月23日 +0800 PM2:53，Stamatis Zampetakis <za...@gmail.com>，写道：
> > Overall, I agree that better encapsulation of propagation and derivation
> of
> > traits would be beneficial for our system.
> >
> > Regarding the API proposed by Haisheng, I have to think a bit more on it.
> > At first glance, adding such methods directly in the RelNode API does not
> > appear an ideal solution since I don't see how easily it can be extended
> to
> > support other kinds of traits.
> >
> > Best,
> > Stamatis
> >
> > On Mon, Oct 21, 2019 at 7:31 AM Haisheng Yuan <h....@alibaba-inc.com>
> > wrote:
> >
> > > To Stamatis,
> > > Not exactly. My initial thought was giving the physical operator the
> > > abiity to customize and fully control physical property derivation
> > > strategy, thus can further help the purpose driven trait request. But
> since
> > > we agree to think more high-level API to support on-demand traitset
> > > request, I will illustrate what API is expected from implentator's
> > > perspective.
> > >
> > > Jingfeng gave us basic steps on how the plan might be generated using
> > > top-down purpose driven only manner, I think differently with the first
> > > several steps.
> > >
> > > SELECT DISTINCT c, b FROM
> > > ( SELECT R.c c, S.b b FROM R, S
> > > WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> > >
> > > Aggregate . (c, b)
> > > +--- MergeJoin . (a, b, c)
> > > |--- TableScan on R
> > > +-- TableScan on S
> > >
> > > 1. Aggreate require collation (c,b) from its child, not permutation.
> > > 2. MergeJoin's parent require (c,b), it has 2 options. Pass it down, or
> > > ignore it.
> > > a) Pass down. it has join condition on (a,b,c), the required columns
> > > can be coverd by join condition columns, so MergeJoin will try to
> deliver
> > > (c,b,a), and both children must exact match. Then we will have sort on
> both
> > > children of MergeJoin.
> > > b) Ignore it. Require its first child collation on (a,b,c), but
> > > matching type is subset. R delivers (c,b,a). Then using the first
> child's
> > > derived collation trait to require its second child to exact match.
> Thus we
> > > have a sort on S, and a sort on top of MergeJoin.
> > >
> > > Both plan might be good or bad. If R, S are large, but the join result
> is
> > > small, plan b) might be better, otherwise plan a) might be better.
> > >
> > > Anyway, I hope the physical operators can have full control the
> physical
> > > properties requests and derivation, in physical operator class itself,
> not
> > > rules, not other places.
> > >
> > > Per our experience, we have spent too much time on writing code for
> > > dealing with all kinds of property requirement and derivation. But in
> fact,
> > > life should be easier. I would like to the physical operator provides
> the
> > > following API, and the 3rd party implementator just need to
> > > override/implement them, no more need to be taken care.
> > >
> > > 1. void setDistributionRequests(int numReq)
> > > Each operator can specify how many optimzation requests on some trait
> it
> > > want to do. e.g. HashJoin may request the following distribution on
> both
> > > children:
> > > - (hash distribution on key1, hash distribution on key1)
> > > - (hash distribution on key2, hash distribution on key2)
> > > - (hash distribution on all keys, hash distribution on all keys)
> > > - (Any, Broadcast)
> > > - (Gather, Gather)
> > >
> > > 2. RelDistribution requiredDistribution(RelDistribution required, int
> > > child) //same for collation
> > > Given the required distribution from parent operator, returns the
> required
> > > distribution for its nth child.
> > >
> > > 3. RelDistribution derivedDistribution() //same for collation
> > > Derive the distribution of the operator itelf from child operators.
> > >
> > > 4. MatchType distributionMatchType(int child) //same for collation
> > > Returns the distribution match type for its nth child, how does it
> match
> > > the other children.
> > > Similar with Jinfeng's point, I think there should be 3 types of
> matching:
> > > exact, satisfy, subset.
> > > e.g.
> > > R is distributed by (a), S is distributed by (a,b)
> > > select * from R join S using a,b,c
> > > If we have plan
> > > HashJoin
> > > |-- TableScan on R
> > > +-- TableScan on S
> > > We may require the match type on S to be satisfy. (a,b) satisfies
> required
> > > distribution (a,b,c).
> > > Fot the outer child R, we require it to be exact match with inner.
> > >
> > > 5. ExecOrder getExecOrder()
> > > Returns how the operator's children is executed, left to right, or
> right
> > > to left. Typically, hash join is right to left. We might use this as
> the
> > > optimization order. To make sure we have correct plans, we have to
> optimize
> > > child and enforce properties in the order that is specific to the
> physical
> > > operator.
> > > All the other dirty work should be done by the optimization engine, but
> > > not through rules, I believe. However, I havn't got any clear plan on
> how
> > > to achieve it inside the engine.
> > >
> > > Haisheng
> > >
> > > ------------------------------------------------------------------
> > > 发件人：Jacques Nadeau<ja...@apache.org>
> > > 日 期：2019年10月21日 11:04:19
> > > 收件人：<de...@calcite.apache.org>
> > > 主 题：Re: [DISCUSS] On-demand traitset request
> > >
> > > Definitely agree that this has been a long time missing. I've been
> > > challenged by this absence since before Calcite was Calcite. I also
> > > remember the trials and tribulations around this that Jinfeng
> references
> > > above.
> > >
> > > In general, I think the first thing one might want to before actually
> doing
> > > this is to make trait derivation internally defined based on the impact
> > > that a rel node has on traits. I've always found the externally
> provided
> > > rel traits to be problematic and a potential place for hidden bugs (row
> > > type has the same problem) . It means that trait derivation of a
> relnode is
> > > based on the rules that do transformation as opposed to the "physical"
> > > impact of the relnode. (It also leads to derivation behavior for a
> relnode
> > > being scattered in many different rules.) If moved to the rel node, it
> also
> > > provides a second benefit, once you encapsulate this propagation
> logic, you
> > > could also expose this as a trait derivation function that the planner
> > > could use to seek out derivation paths.
> > >
> > > At Dremio we toyed last year with the idea of adding a heuristic cycle
> on
> > > top of the existing volano planner and relset state. In this model a
> > > RelNode would have two additional methods: it would expose a trait
> > > propagation function (as described above) and optionally expose one or
> more
> > > specific traits this node desired. When the planner arrived at a
> > > conclusion, you'd run the heuristic cycle to further propagate desired
> > > traits (if possible) and then restart the planning cycle based on any
> new
> > > transformations done during the heuristic stage. You'd then repeat this
> > > volcano/trait prop cycle until you arrive at a "completed" state.
> > >
> > > We never actually got to implementation but I'm super supportive of
> someone
> > > picking this up.
> > >
> > >
> > >
> > > On Sat, Oct 19, 2019 at 12:25 AM Stamatis Zampetakis <
> zabetak@gmail.com>
> > > wrote:
> > >
> > > > Thanks all for the very interesting usecases and helpful examples.
> > > >
> > > > I would like to stay a bit on the fact that logical operators do not
> have
> > > > physical traits. Calcite's logical operators do have at least one
> > > physical
> > > > trait which is Convention.NONE. Other logical operators such as:
> > > >
> > > > LogicalTableScan [1]
> > > > LogicalFilter [2]
> > > > LogicalProject [3]
> > > > LogicalWindow [4]
> > > >
> > > > have additional traits regarding collation and distribution. There is
> > > > already some sort of trait derivation so to some extend it is
> possible to
> > > > check the traitset of the child (logical) operator before requesting
> some
> > > > other traitset when creating the parent (physical).
> > > >
> > > > I see that this mechanism of adding explicitly traits to logical
> > > operators
> > > > may be confusing and may also lead to planning problems. Replacing
> it by
> > > > metadata might be a good idea and it is closer to the idea of
> > > > "applicability function" mentioned in the Volcano paper. Assuming
> that we
> > > > follow this approach I would assume that the traitset of logical
> > > operators
> > > > from now on should be always empty.
> > > >
> > > > Is this what you have in mind Haisheng?
> > > >
> > > > Best,
> > > > Stamatis
> > > >
> > > > [1]
> > > >
> > > >
> > >
> https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalTableScan.java#L95
> > > > [2]
> > > >
> > > >
> > >
> https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalFilter.java#L105
> > > > [3]
> > > >
> > > >
> > >
> https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalProject.java#L104
> > > > [4]
> > > >
> > > >
> > >
> https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalWindow.java#L95
> > > >
> > > > On Sat, Oct 19, 2019 at 7:39 AM Xiening Dai <xn...@gmail.com>
> wrote:
> > > >
> > > > > Thanks for the sharing. I like the way you model this problem,
> Jinfeng.
> > > > >
> > > > > There’s one minor issue with your example. Let say if R and S
> doesn’t
> > > > have
> > > > > sorting properties at all. In your case, we would end up adding
> > > enforcers
> > > > > for LHS and RHS to get collation (a, b, c). Then we would need
> another
> > > > > enforcer to get collation (b, c). This is a sub optimal plan as we
> > > could
> > > > > have use (b, c, a) for join.
> > > > >
> > > > > I think in step #2, the join operator would need to take the agg
> trait
> > > > > requirement into account. Then it would have two options -
> > > > >
> > > > > 1) require *exact/super* match of (b, c, a) or (c, b, a); this is
> to
> > > > > guarantee the join output would deliver the collation agg needs.
> > > > > 2) require permutation match of (a, b, c); in such case, an
> enforcer
> > > > might
> > > > > be needed for aggregation.
> > > > >
> > > > > Eventually the cost model decides who is the winner.
> > > > >
> > > > > There’s a fundamental difference between your model and Haisheng’s
> > > > > proposal. In Haisheng’s case, a rel node not only looks at its
> parent’s
> > > > > requirement, but also tries to get the potential traits its input
> could
> > > > > deliver. It would try to align them to eliminate unnecessary
> > > > alternatives.
> > > > >
> > > > > In above example, assuming R is (b, c, a) and S is (a, b, c), to
> > > > implement
> > > > > option 1), we would generate two alternatives -
> > > > >
> > > > > MergeJoin (b, c, a)
> > > > > TableScan R
> > > > > Sort(b, c, a)
> > > > > TableScan S
> > > > >
> > > > > MergeJoin(c, b, a)
> > > > > Sort(c, b, a)
> > > > > TableScan R
> > > > > Sort(c, b, a)
> > > > > TableScan S
> > > > >
> > > > > But if we look at the input traits and has the insight that R
> already
> > > > > delivers (b, c, a), we could decide to require (b, c, a) only and
> avoid
> > > > > generating the 2nd plan, which is definitely worse, and reduce the
> > > search
> > > > > space.
> > > > >
> > > > >
> > > > > > On Oct 18, 2019, at 4:57 PM, Jinfeng Ni <jn...@apache.org> wrote:
> > > > > >
> > > > > > A little bit of history. In Drill, when we first implemented
> > > > > > Distribution trait's definition, we allows both exact match and
> > > > > > partial match in satisfy() method. This works fine for
> single-input
> > > > > > operator such aggregation, however it leads to incorrect plan for
> > > join
> > > > > > query, i.e LHS shuffle with (a, b), RHS shuffle with (a) . At
> that
> > > > > > time, we removed partial match, and use exact match only. Yet
> this
> > > > > > changes leads to unnecessary additional exchange. To mitigate
> this
> > > > > > problem, in join physical operator, for a join key (a, b, c), we
> > > > > > enumerate different distribution requests, yet this lead to more
> > > space
> > > > > > to explore and significantly increase planning time (which is
> > > probably
> > > > > > what Haisheng also experienced). When I look back, I feel
> probably
> > > > > > what we miss is the "coordination" step in the join operator,
> because
> > > > > > if we relax the requirement of satisfy(), for multi-input
> operators,
> > > > > > we have to enforce some "coordination", to make sure multiple
> input's
> > > > > > trait could work together properly.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Oct 18, 2019 at 4:38 PM Jinfeng Ni <jn...@apache.org>
> wrote:
> > > > > > >
> > > > > > > This is an interesting topic. Thanks for bringing up this
> issue.
> > > > > > >
> > > > > > > My understanding of Volcano planner is it works in a top-down
> search
> > > > > > > mode (the parent asks for certain trait of its child), while
> the
> > > trait
> > > > > > > propagates in a bottom-up way, as Stamatis explained.
> > > > > > >
> > > > > > > IMHO, the issue comes down to the definition of RelTrait, how
> to
> > > > > > > determine if a trait A could satisfy a request asking for
> trait B,
> > > > > > > that is, how RelTrait.satisfies() method is implemented.
> > > > > > >
> > > > > > > Let's first clarify different situations, using collation as
> > > example.
> > > > > > > 1) The collation is requested by query's outmost ORDER BY
> clause.
> > > > > > > - The generated plan has to have "exact match", i.e same
> collation
> > > > > > > (same column sequence), or "super match" .
> > > > > > > exact match: (a, b) satisfy (a, b)
> > > > > > > super match: (a, b, c) satisfy (a, b)
> > > > > > >
> > > > > > > 2) The collation is requested by operand with single input,
> such as
> > > > > > > sort-based Aggregation.
> > > > > > > - In such case, a "permutation match" is sufficient.
> > > > > > > For instance, for Aggregation (b,c), input with collation (c,
> b)
> > > > > > > could satisfy the requirement.
> > > > > > > permutation match: (b, c) satisfy (c, b). (c, b) satisfy (c,
> > > > b)
> > > > > > > permutation match: (b, c, a) satisfy (c, b). (c, b, a) satisfy
> > > > (c,
> > > > > b)
> > > > > > >
> > > > > > > 3) The collation is requested by operand with >= 2 inputs,
> such as
> > > > > > > sort-based MergeJoin.
> > > > > > > - A permutation match is sufficient for each input
> > > > > > > - MergeJoin has to do coordination, after input's trait
> propagates
> > > > > > > upwards. In other words, ensure both inputs's permutation
> match are
> > > > > > > actually same sequence. Otherwise, enforcer could be inserted
> upon
> > > > > > > each input, and the planner generates two plans and let the
> cost
> > > > > > > decide.
> > > > > > >
> > > > > > > For the first case, this is how today's RelCollation's
> satisfy()
> > > > > > > method is implemented.
> > > > > > >
> > > > > > > For the second / third cases, use Haisheng's example,
> > > > > > >
> > > > > > > SELECT DISTINCT c, b FROM
> > > > > > > ( SELECT R.c c, S.b b FROM R, S
> > > > > > > WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> > > > > > >
> > > > > > > Aggregate . (c, b)
> > > > > > > +--- MergeJoin . (a, b, c)
> > > > > > > |--- TableScan on R
> > > > > > > +--- TableScan on S
> > > > > > >
> > > > > > > Here is the steps that might take place in the planner:
> > > > > > >
> > > > > > > 1) Aggregate request permutation match collation (c, b)
> > > > > > > 2) MergeJoin request a permutation match of (a, b,c) on both
> it's
> > > > input
> > > > > > > 3) R respond with collation (c, b, a), which satisfy
> MergeJoin's LHS
> > > > > requirement
> > > > > > > 4) S respond with collation (b, c, a), which satisfy
> MergeJoins' RHS
> > > > > requirement
> > > > > > > 5) MergeJoin do a coordination o LHS, RHS, and generate two
> possible
> > > > > plans
> > > > > > > MJ1: Insert a sort of (c, b, a) on RHS. This MJ operator now
> has
> > > > > > > collation of (c, b, a)
> > > > > > > MJ2: Insert a sort of (b, c, a) on LHS. This MJ operator now
> has
> > > > > > > collation of (b, c, a)
> > > > > > > 6) MJ1 and MJ2 could both satisfy permutation match request in
> step
> > > > > > > 1, leading to two possible plans:
> > > > > > > Agg1: with input of MJ1
> > > > > > > Agg2: with input of MJ2
> > > > > > > 7) planner chooses a best plan based on cost of Agg1 and Agg2.
> > > > > > >
> > > > > > > I should point that the enforcer sort inserted in step 5 could
> help
> > > > > > > remove redundant sort in its input, if the input's collation is
> > > > > > > obtained from sort, by invoking Calcite's SortRemove Rule.
> > > > > > >
> > > > > > > The above only considers the column sequence. The DESC/ASC,
> NULL
> > > > > > > FIRST/LAST will add more complexity, but we probably use
> similar
> > > idea.
> > > > > > >
> > > > > > > In summary, we need :
> > > > > > > 1) redefine collation trait's satisfy() policy, exact match,
> super
> > > > > > > match, permutation match,
> > > > > > > 2) different physical operator applies different trait matching
> > > > > > > policy, depending on operator's # of inputs, and algorithm
> > > > > > > implementation.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Oct 18, 2019 at 2:51 PM Haisheng Yuan <
> > > h.yuan@alibaba-inc.com
> > > > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > Hi Stamatis,
> > > > > > > >
> > > > > > > > Thanks for your comment. I think my example didn't make it
> clear.
> > > > > > > >
> > > > > > > > When a logical operator is created, it doesn't have any
> physical,
> > > > > > > > propertyand it shouldn't have. When a physical operator is
> created,
> > > > > > > > e.g. in Enumerable convention, it only creates an intuitive
> > > traitset
> > > > > > > > with it, and requests it children the corresponding ones.
> > > > > > > >
> > > > > > > > For operators such as Join, Aggregate, Window, which may
> deliver
> > > > > > > > multiple different traitsets, when the parent operator is
> created
> > > and
> > > > > > > > request its traitset, it might be good to know what are the
> > > poosible
> > > > > > > > traitset that the child operator can deliver. e.g.
> > > > > > > >
> > > > > > > > SELECT DISTINCT c, b FROM
> > > > > > > > ( SELECT R.c c, S.b b FROM R, S
> > > > > > > > WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> > > > > > > >
> > > > > > > > Suppose R is ordered by (c, b, a), and S is ordered by (b,
> c, a).
> > > > > > > > Here is the logical plan:
> > > > > > > > Aggregate
> > > > > > > > +--- InnerJoin
> > > > > > > > |--- TableScan on R
> > > > > > > > +--- TableScan on S
> > > > > > > >
> > > > > > > > When we create a physical merge join for the inner join, it
> may
> > > just
> > > > > > > > have collation sorted on a,b,c. Then the aggreate on top of
> join
> > > will
> > > > > > > > request another sort on c,b, thus we miss the best plan.
> What we
> > > > > > > > can do is requesting all the order combinations, which is
> n!, like
> > > > > > > > how the Values operator does. But that is too much.
> > > > > > > >
> > > > > > > > If we can provide an approach that can minimize the possiple
> > > traitset
> > > > > > > > that the child operator may deliver, we can reduce the
> chance of
> > > > > missing
> > > > > > > > good plans. For the above query, the Aggregate operator can
> derive
> > > > > > > > possible traitsets that its child operator join can deliver,
> in
> > > which
> > > > > case,
> > > > > > > > the possiple traitsets of join is
> > > > > > > > 1. collation on (a,b,c) based on join condition,
> > > > > > > > 2. collation on (c,b,a) based on left child,
> > > > > > > > 3. collation on (b,c,a) based on right child
> > > > > > > > So we can request Aggregate sorted by (c,b) and Join sorted
> by
> > > > (c,b,a).
> > > > > > > > The number of traiset requests and plan alternatives can be
> > > reduced.
> > > > > > > > The DerivedTraitSets can be used to derive the possible
> traitsets
> > > > from
> > > > > > > > Join, and pass through Project, Filter etc...
> > > > > > > >
> > > > > > > > This is just an example of non-distributed system, for
> distributed
> > > > > system,
> > > > > > > > it can save much more by considering the possible
> distribution
> > > > > delivered
> > > > > > > > by child operators.
> > > > > > > >
> > > > > > > > One thing that concerns me is it highly relies on the traiset
> > > system
> > > > > of the
> > > > > > > > underlying physical system. Like Enumerable doesn't consider
> > > > > distribution,
> > > > > > > > because it is single-node system, but Hive/Flink are
> distributed
> > > > > system.
> > > > > > > > - Haisheng
> > > > > > > >
> > > > > > > >
> ------------------------------------------------------------------
> > > > > > > > 发件人：Stamatis Zampetakis<za...@gmail.com>
> > > > > > > > 日 期：2019年10月18日 14:53:41
> > > > > > > > 收件人：<de...@calcite.apache.org>
> > > > > > > > 主 题：Re: [DISCUSS] On-demand traitset request
> > > > > > > >
> > > > > > > > Hi Haisheng,
> > > > > > > >
> > > > > > > > This is an interesting topic but somehow in my mind I
> thought that
> > > > this
> > > > > > > > mechanism is already in place.
> > > > > > > >
> > > > > > > > When an operator (logical or physical) is created its
> traitset is
> > > > > > > > determined in bottom-up fashion using the create
> > > > > > > > static factory method present in almost all operators. In my
> mind
> > > > this
> > > > > is
> > > > > > > > in some sense the applicability function
> > > > > > > > mentioned in [1].
> > > > > > > >
> > > > > > > > Now during optimization we proceed in top-down manner and we
> > > request
> > > > > > > > certain traitsets from the operators.
> > > > > > > > If it happens and they contain already the requested traits
> nothing
> > > > > needs
> > > > > > > > to be done.
> > > > > > > >
> > > > > > > > In your example when we are about to create the sort-merge
> join we
> > > > can
> > > > > > > > check what traitsets are present in the inputs
> > > > > > > > and if possible request those. Can you elaborate a bit more
> why do
> > > we
> > > > > need
> > > > > > > > a new type of metadata?
> > > > > > > >
> > > > > > > > Anyway if we cannot do it at the moment it makes sense to
> complete
> > > > the
> > > > > > > > missing bits since what you are describing
> > > > > > > > was already mentioned in the original design of the Volcano
> > > optimizer
> > > > > [1].
> > > > > > > >
> > > > > > > > "If a move to be pursued is the exploration of a normal query
> > > > > processing
> > > > > > > > algorithm such as merge-join, its cost is calculated by the
> > > > algorithm's
> > > > > > > > cost function. The algorithm's applicability function
> determines
> > > the
> > > > > > > > physical properly vectors for the algorithms inputs, and
> their
> > > costs
> > > > > and
> > > > > > > > optimal plans are found by invoking FindBestPlan for the
> inputs.
> > > For
> > > > > some
> > > > > > > > binary operators, the actual physical properties of the
> inputs are
> > > > not
> > > > > as
> > > > > > > > important as the consistency of physical properties among the
> > > inputs.
> > > > > For
> > > > > > > > example, for a sort-based implementation of intersection,
> i.e., an
> > > > > > > > algorithm very similar to merge-join, any sort order of the
> two
> > > > inputs
> > > > > will
> > > > > > > > suffice as long as the two inputs are sorted in the same way.
> > > > > Similarly,
> > > > > > > > for a parallel join, any partitioning of join inputs across
> > > multiple
> > > > > > > > processing nodes is acceptable if both inputs are
> partitioned using
> > > > > > > > Compatible partitioning rules. For these cases, the search
> engine
> > > > > permits
> > > > > > > > the optimizer implementor to specify a number of physical
> property
> > > > > vectors
> > > > > > > > to be tried. For example, for the intersection of two inputs
> R and
> > > S
> > > > > with
> > > > > > > > attributes A, B, and C where R is sorted on (A,B,C) and S is
> sorted
> > > > on
> > > > > > > > (B,A,C), both these sort orders can be specified by the
> optimizer
> > > > > > > > implementor and will be optimized by the generated
> optimizer, while
> > > > > other
> > > > > > > > possible sort orders, e.g., (C,B,A), will be ignored. " [1]
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Stamatis
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > >
> > > >
> > >
> https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
> > > > > > > >
> > > > > > > > On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <
> > > > h.yuan@alibaba-inc.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > TL;DR
> > > > > > > > > Both top-down physical TraitSet request and bottom-up
> TraitSet
> > > > > > > > > derivation have their strongth and weakness, we propose
> > > > > > > > > on-demand TraitSet request to combine the above two, to
> reduce
> > > > > > > > > the number of plan alternatives that are genereated,
> especially
> > > > > > > > > in distributed system.
> > > > > > > > >
> > > > > > > > > e.g.
> > > > > > > > > select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
> > > > > > > > >
> > > > > > > > > In non-distributed system, we can generate a sort merge
> join,
> > > > > > > > > requesting foo sorted by f1,f2,f3 and bar sorted by
> b1,b2,b3.
> > > > > > > > > But if foo happens to be sorted by f3,f2,f1, we may miss
> the
> > > > > > > > > chance of making use of the delivered ordering of foo.
> Because
> > > > > > > > > if we require bar to be sorted by b3,b2,b1, we don't need
> to
> > > > > > > > > sort on foo anymore. There are so many choices, n!, not
> even
> > > > > > > > > considering asc/desc and null direction. We can't request
> all
> > > > > > > > > the possible traitsets in top-down way, and can't derive
> all the
> > > > > > > > > possible traitsets in bottom-up way either.
> > > > > > > > >
> > > > > > > > > We propose on-demand traitset request by adding a new type
> > > > > > > > > of metadata DerivedTraitSets into the built-in metadata
> system.
> > > > > > > > >
> > > > > > > > > List<RelTraitSet> deriveTraitSets(RelNode,
> RelMetadataQuery)
> > > > > > > > >
> > > > > > > > > In this metadata, every operator returns several possbile
> > > traitsets
> > > > > > > > > that may be derived from this operator.
> > > > > > > > >
> > > > > > > > > Using above query as an example, the tablescan on foo
> should
> > > > > > > > > return traiset with collation on f3, f2, f1.
> > > > > > > > >
> > > > > > > > > In physical implementation rules, e.g. the
> SortMergeJoinRule,
> > > > > > > > > it gets possible traitsets from both child operators, uses
> the
> > > join
> > > > > > > > > keys to eliminate useless traitsets, leaves out usefull
> traitsets,
> > > > > > > > > and requests corresponding traitset on the other child.
> > > > > > > > >
> > > > > > > > > This relies on the feature of AbstractConverter, which is
> turned
> > > > > > > > > off by default, due to performance issue [1].
> > > > > > > > >
> > > > > > > > > Thoughts?
> > > > > > > > >
> > > > > > > > > [1] https://issues.apache.org/jira/browse/CALCITE-2970
> > > > > > > > >
> > > > > > > > > Haisheng
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
>

Re: Re: [DISCUSS] On-demand traitset request

Posted by Danny Chan <yu...@gmail.com>.

I have the same feeling, it seems to much interfaces for the physical node(we do not really have physical class for physical nodes yet), so these interfaces may just be put into the RelNode, that was too complex and to much for me, can we have a way that do not modify the nodes itself ?

Best,
Danny Chan
在 2019年10月23日 +0800 PM2:53，Stamatis Zampetakis <za...@gmail.com>，写道：
> Overall, I agree that better encapsulation of propagation and derivation of
> traits would be beneficial for our system.
>
> Regarding the API proposed by Haisheng, I have to think a bit more on it.
> At first glance, adding such methods directly in the RelNode API does not
> appear an ideal solution since I don't see how easily it can be extended to
> support other kinds of traits.
>
> Best,
> Stamatis
>
> On Mon, Oct 21, 2019 at 7:31 AM Haisheng Yuan <h....@alibaba-inc.com>
> wrote:
>
> > To Stamatis,
> > Not exactly. My initial thought was giving the physical operator the
> > abiity to customize and fully control physical property derivation
> > strategy, thus can further help the purpose driven trait request. But since
> > we agree to think more high-level API to support on-demand traitset
> > request, I will illustrate what API is expected from implentator's
> > perspective.
> >
> > Jingfeng gave us basic steps on how the plan might be generated using
> > top-down purpose driven only manner, I think differently with the first
> > several steps.
> >
> > SELECT DISTINCT c, b FROM
> > ( SELECT R.c c, S.b b FROM R, S
> > WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> >
> > Aggregate . (c, b)
> > +--- MergeJoin . (a, b, c)
> > |--- TableScan on R
> > +-- TableScan on S
> >
> > 1. Aggreate require collation (c,b) from its child, not permutation.
> > 2. MergeJoin's parent require (c,b), it has 2 options. Pass it down, or
> > ignore it.
> > a) Pass down. it has join condition on (a,b,c), the required columns
> > can be coverd by join condition columns, so MergeJoin will try to deliver
> > (c,b,a), and both children must exact match. Then we will have sort on both
> > children of MergeJoin.
> > b) Ignore it. Require its first child collation on (a,b,c), but
> > matching type is subset. R delivers (c,b,a). Then using the first child's
> > derived collation trait to require its second child to exact match. Thus we
> > have a sort on S, and a sort on top of MergeJoin.
> >
> > Both plan might be good or bad. If R, S are large, but the join result is
> > small, plan b) might be better, otherwise plan a) might be better.
> >
> > Anyway, I hope the physical operators can have full control the physical
> > properties requests and derivation, in physical operator class itself, not
> > rules, not other places.
> >
> > Per our experience, we have spent too much time on writing code for
> > dealing with all kinds of property requirement and derivation. But in fact,
> > life should be easier. I would like to the physical operator provides the
> > following API, and the 3rd party implementator just need to
> > override/implement them, no more need to be taken care.
> >
> > 1. void setDistributionRequests(int numReq)
> > Each operator can specify how many optimzation requests on some trait it
> > want to do. e.g. HashJoin may request the following distribution on both
> > children:
> > - (hash distribution on key1, hash distribution on key1)
> > - (hash distribution on key2, hash distribution on key2)
> > - (hash distribution on all keys, hash distribution on all keys)
> > - (Any, Broadcast)
> > - (Gather, Gather)
> >
> > 2. RelDistribution requiredDistribution(RelDistribution required, int
> > child) //same for collation
> > Given the required distribution from parent operator, returns the required
> > distribution for its nth child.
> >
> > 3. RelDistribution derivedDistribution() //same for collation
> > Derive the distribution of the operator itelf from child operators.
> >
> > 4. MatchType distributionMatchType(int child) //same for collation
> > Returns the distribution match type for its nth child, how does it match
> > the other children.
> > Similar with Jinfeng's point, I think there should be 3 types of matching:
> > exact, satisfy, subset.
> > e.g.
> > R is distributed by (a), S is distributed by (a,b)
> > select * from R join S using a,b,c
> > If we have plan
> > HashJoin
> > |-- TableScan on R
> > +-- TableScan on S
> > We may require the match type on S to be satisfy. (a,b) satisfies required
> > distribution (a,b,c).
> > Fot the outer child R, we require it to be exact match with inner.
> >
> > 5. ExecOrder getExecOrder()
> > Returns how the operator's children is executed, left to right, or right
> > to left. Typically, hash join is right to left. We might use this as the
> > optimization order. To make sure we have correct plans, we have to optimize
> > child and enforce properties in the order that is specific to the physical
> > operator.
> > All the other dirty work should be done by the optimization engine, but
> > not through rules, I believe. However, I havn't got any clear plan on how
> > to achieve it inside the engine.
> >
> > Haisheng
> >
> > ------------------------------------------------------------------
> > 发件人：Jacques Nadeau<ja...@apache.org>
> > 日 期：2019年10月21日 11:04:19
> > 收件人：<de...@calcite.apache.org>
> > 主 题：Re: [DISCUSS] On-demand traitset request
> >
> > Definitely agree that this has been a long time missing. I've been
> > challenged by this absence since before Calcite was Calcite. I also
> > remember the trials and tribulations around this that Jinfeng references
> > above.
> >
> > In general, I think the first thing one might want to before actually doing
> > this is to make trait derivation internally defined based on the impact
> > that a rel node has on traits. I've always found the externally provided
> > rel traits to be problematic and a potential place for hidden bugs (row
> > type has the same problem) . It means that trait derivation of a relnode is
> > based on the rules that do transformation as opposed to the "physical"
> > impact of the relnode. (It also leads to derivation behavior for a relnode
> > being scattered in many different rules.) If moved to the rel node, it also
> > provides a second benefit, once you encapsulate this propagation logic, you
> > could also expose this as a trait derivation function that the planner
> > could use to seek out derivation paths.
> >
> > At Dremio we toyed last year with the idea of adding a heuristic cycle on
> > top of the existing volano planner and relset state. In this model a
> > RelNode would have two additional methods: it would expose a trait
> > propagation function (as described above) and optionally expose one or more
> > specific traits this node desired. When the planner arrived at a
> > conclusion, you'd run the heuristic cycle to further propagate desired
> > traits (if possible) and then restart the planning cycle based on any new
> > transformations done during the heuristic stage. You'd then repeat this
> > volcano/trait prop cycle until you arrive at a "completed" state.
> >
> > We never actually got to implementation but I'm super supportive of someone
> > picking this up.
> >
> >
> >
> > On Sat, Oct 19, 2019 at 12:25 AM Stamatis Zampetakis <za...@gmail.com>
> > wrote:
> >
> > > Thanks all for the very interesting usecases and helpful examples.
> > >
> > > I would like to stay a bit on the fact that logical operators do not have
> > > physical traits. Calcite's logical operators do have at least one
> > physical
> > > trait which is Convention.NONE. Other logical operators such as:
> > >
> > > LogicalTableScan [1]
> > > LogicalFilter [2]
> > > LogicalProject [3]
> > > LogicalWindow [4]
> > >
> > > have additional traits regarding collation and distribution. There is
> > > already some sort of trait derivation so to some extend it is possible to
> > > check the traitset of the child (logical) operator before requesting some
> > > other traitset when creating the parent (physical).
> > >
> > > I see that this mechanism of adding explicitly traits to logical
> > operators
> > > may be confusing and may also lead to planning problems. Replacing it by
> > > metadata might be a good idea and it is closer to the idea of
> > > "applicability function" mentioned in the Volcano paper. Assuming that we
> > > follow this approach I would assume that the traitset of logical
> > operators
> > > from now on should be always empty.
> > >
> > > Is this what you have in mind Haisheng?
> > >
> > > Best,
> > > Stamatis
> > >
> > > [1]
> > >
> > >
> > https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalTableScan.java#L95
> > > [2]
> > >
> > >
> > https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalFilter.java#L105
> > > [3]
> > >
> > >
> > https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalProject.java#L104
> > > [4]
> > >
> > >
> > https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalWindow.java#L95
> > >
> > > On Sat, Oct 19, 2019 at 7:39 AM Xiening Dai <xn...@gmail.com> wrote:
> > >
> > > > Thanks for the sharing. I like the way you model this problem, Jinfeng.
> > > >
> > > > There’s one minor issue with your example. Let say if R and S doesn’t
> > > have
> > > > sorting properties at all. In your case, we would end up adding
> > enforcers
> > > > for LHS and RHS to get collation (a, b, c). Then we would need another
> > > > enforcer to get collation (b, c). This is a sub optimal plan as we
> > could
> > > > have use (b, c, a) for join.
> > > >
> > > > I think in step #2, the join operator would need to take the agg trait
> > > > requirement into account. Then it would have two options -
> > > >
> > > > 1) require *exact/super* match of (b, c, a) or (c, b, a); this is to
> > > > guarantee the join output would deliver the collation agg needs.
> > > > 2) require permutation match of (a, b, c); in such case, an enforcer
> > > might
> > > > be needed for aggregation.
> > > >
> > > > Eventually the cost model decides who is the winner.
> > > >
> > > > There’s a fundamental difference between your model and Haisheng’s
> > > > proposal. In Haisheng’s case, a rel node not only looks at its parent’s
> > > > requirement, but also tries to get the potential traits its input could
> > > > deliver. It would try to align them to eliminate unnecessary
> > > alternatives.
> > > >
> > > > In above example, assuming R is (b, c, a) and S is (a, b, c), to
> > > implement
> > > > option 1), we would generate two alternatives -
> > > >
> > > > MergeJoin (b, c, a)
> > > > TableScan R
> > > > Sort(b, c, a)
> > > > TableScan S
> > > >
> > > > MergeJoin(c, b, a)
> > > > Sort(c, b, a)
> > > > TableScan R
> > > > Sort(c, b, a)
> > > > TableScan S
> > > >
> > > > But if we look at the input traits and has the insight that R already
> > > > delivers (b, c, a), we could decide to require (b, c, a) only and avoid
> > > > generating the 2nd plan, which is definitely worse, and reduce the
> > search
> > > > space.
> > > >
> > > >
> > > > > On Oct 18, 2019, at 4:57 PM, Jinfeng Ni <jn...@apache.org> wrote:
> > > > >
> > > > > A little bit of history. In Drill, when we first implemented
> > > > > Distribution trait's definition, we allows both exact match and
> > > > > partial match in satisfy() method. This works fine for single-input
> > > > > operator such aggregation, however it leads to incorrect plan for
> > join
> > > > > query, i.e LHS shuffle with (a, b), RHS shuffle with (a) . At that
> > > > > time, we removed partial match, and use exact match only. Yet this
> > > > > changes leads to unnecessary additional exchange. To mitigate this
> > > > > problem, in join physical operator, for a join key (a, b, c), we
> > > > > enumerate different distribution requests, yet this lead to more
> > space
> > > > > to explore and significantly increase planning time (which is
> > probably
> > > > > what Haisheng also experienced). When I look back, I feel probably
> > > > > what we miss is the "coordination" step in the join operator, because
> > > > > if we relax the requirement of satisfy(), for multi-input operators,
> > > > > we have to enforce some "coordination", to make sure multiple input's
> > > > > trait could work together properly.
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Oct 18, 2019 at 4:38 PM Jinfeng Ni <jn...@apache.org> wrote:
> > > > > >
> > > > > > This is an interesting topic. Thanks for bringing up this issue.
> > > > > >
> > > > > > My understanding of Volcano planner is it works in a top-down search
> > > > > > mode (the parent asks for certain trait of its child), while the
> > trait
> > > > > > propagates in a bottom-up way, as Stamatis explained.
> > > > > >
> > > > > > IMHO, the issue comes down to the definition of RelTrait, how to
> > > > > > determine if a trait A could satisfy a request asking for trait B,
> > > > > > that is, how RelTrait.satisfies() method is implemented.
> > > > > >
> > > > > > Let's first clarify different situations, using collation as
> > example.
> > > > > > 1) The collation is requested by query's outmost ORDER BY clause.
> > > > > > - The generated plan has to have "exact match", i.e same collation
> > > > > > (same column sequence), or "super match" .
> > > > > > exact match: (a, b) satisfy (a, b)
> > > > > > super match: (a, b, c) satisfy (a, b)
> > > > > >
> > > > > > 2) The collation is requested by operand with single input, such as
> > > > > > sort-based Aggregation.
> > > > > > - In such case, a "permutation match" is sufficient.
> > > > > > For instance, for Aggregation (b,c), input with collation (c, b)
> > > > > > could satisfy the requirement.
> > > > > > permutation match: (b, c) satisfy (c, b). (c, b) satisfy (c,
> > > b)
> > > > > > permutation match: (b, c, a) satisfy (c, b). (c, b, a) satisfy
> > > (c,
> > > > b)
> > > > > >
> > > > > > 3) The collation is requested by operand with >= 2 inputs, such as
> > > > > > sort-based MergeJoin.
> > > > > > - A permutation match is sufficient for each input
> > > > > > - MergeJoin has to do coordination, after input's trait propagates
> > > > > > upwards. In other words, ensure both inputs's permutation match are
> > > > > > actually same sequence. Otherwise, enforcer could be inserted upon
> > > > > > each input, and the planner generates two plans and let the cost
> > > > > > decide.
> > > > > >
> > > > > > For the first case, this is how today's RelCollation's satisfy()
> > > > > > method is implemented.
> > > > > >
> > > > > > For the second / third cases, use Haisheng's example,
> > > > > >
> > > > > > SELECT DISTINCT c, b FROM
> > > > > > ( SELECT R.c c, S.b b FROM R, S
> > > > > > WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> > > > > >
> > > > > > Aggregate . (c, b)
> > > > > > +--- MergeJoin . (a, b, c)
> > > > > > |--- TableScan on R
> > > > > > +--- TableScan on S
> > > > > >
> > > > > > Here is the steps that might take place in the planner:
> > > > > >
> > > > > > 1) Aggregate request permutation match collation (c, b)
> > > > > > 2) MergeJoin request a permutation match of (a, b,c) on both it's
> > > input
> > > > > > 3) R respond with collation (c, b, a), which satisfy MergeJoin's LHS
> > > > requirement
> > > > > > 4) S respond with collation (b, c, a), which satisfy MergeJoins' RHS
> > > > requirement
> > > > > > 5) MergeJoin do a coordination o LHS, RHS, and generate two possible
> > > > plans
> > > > > > MJ1: Insert a sort of (c, b, a) on RHS. This MJ operator now has
> > > > > > collation of (c, b, a)
> > > > > > MJ2: Insert a sort of (b, c, a) on LHS. This MJ operator now has
> > > > > > collation of (b, c, a)
> > > > > > 6) MJ1 and MJ2 could both satisfy permutation match request in step
> > > > > > 1, leading to two possible plans:
> > > > > > Agg1: with input of MJ1
> > > > > > Agg2: with input of MJ2
> > > > > > 7) planner chooses a best plan based on cost of Agg1 and Agg2.
> > > > > >
> > > > > > I should point that the enforcer sort inserted in step 5 could help
> > > > > > remove redundant sort in its input, if the input's collation is
> > > > > > obtained from sort, by invoking Calcite's SortRemove Rule.
> > > > > >
> > > > > > The above only considers the column sequence. The DESC/ASC, NULL
> > > > > > FIRST/LAST will add more complexity, but we probably use similar
> > idea.
> > > > > >
> > > > > > In summary, we need :
> > > > > > 1) redefine collation trait's satisfy() policy, exact match, super
> > > > > > match, permutation match,
> > > > > > 2) different physical operator applies different trait matching
> > > > > > policy, depending on operator's # of inputs, and algorithm
> > > > > > implementation.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Oct 18, 2019 at 2:51 PM Haisheng Yuan <
> > h.yuan@alibaba-inc.com
> > > >
> > > > wrote:
> > > > > > >
> > > > > > > Hi Stamatis,
> > > > > > >
> > > > > > > Thanks for your comment. I think my example didn't make it clear.
> > > > > > >
> > > > > > > When a logical operator is created, it doesn't have any physical,
> > > > > > > propertyand it shouldn't have. When a physical operator is created,
> > > > > > > e.g. in Enumerable convention, it only creates an intuitive
> > traitset
> > > > > > > with it, and requests it children the corresponding ones.
> > > > > > >
> > > > > > > For operators such as Join, Aggregate, Window, which may deliver
> > > > > > > multiple different traitsets, when the parent operator is created
> > and
> > > > > > > request its traitset, it might be good to know what are the
> > poosible
> > > > > > > traitset that the child operator can deliver. e.g.
> > > > > > >
> > > > > > > SELECT DISTINCT c, b FROM
> > > > > > > ( SELECT R.c c, S.b b FROM R, S
> > > > > > > WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> > > > > > >
> > > > > > > Suppose R is ordered by (c, b, a), and S is ordered by (b, c, a).
> > > > > > > Here is the logical plan:
> > > > > > > Aggregate
> > > > > > > +--- InnerJoin
> > > > > > > |--- TableScan on R
> > > > > > > +--- TableScan on S
> > > > > > >
> > > > > > > When we create a physical merge join for the inner join, it may
> > just
> > > > > > > have collation sorted on a,b,c. Then the aggreate on top of join
> > will
> > > > > > > request another sort on c,b, thus we miss the best plan. What we
> > > > > > > can do is requesting all the order combinations, which is n!, like
> > > > > > > how the Values operator does. But that is too much.
> > > > > > >
> > > > > > > If we can provide an approach that can minimize the possiple
> > traitset
> > > > > > > that the child operator may deliver, we can reduce the chance of
> > > > missing
> > > > > > > good plans. For the above query, the Aggregate operator can derive
> > > > > > > possible traitsets that its child operator join can deliver, in
> > which
> > > > case,
> > > > > > > the possiple traitsets of join is
> > > > > > > 1. collation on (a,b,c) based on join condition,
> > > > > > > 2. collation on (c,b,a) based on left child,
> > > > > > > 3. collation on (b,c,a) based on right child
> > > > > > > So we can request Aggregate sorted by (c,b) and Join sorted by
> > > (c,b,a).
> > > > > > > The number of traiset requests and plan alternatives can be
> > reduced.
> > > > > > > The DerivedTraitSets can be used to derive the possible traitsets
> > > from
> > > > > > > Join, and pass through Project, Filter etc...
> > > > > > >
> > > > > > > This is just an example of non-distributed system, for distributed
> > > > system,
> > > > > > > it can save much more by considering the possible distribution
> > > > delivered
> > > > > > > by child operators.
> > > > > > >
> > > > > > > One thing that concerns me is it highly relies on the traiset
> > system
> > > > of the
> > > > > > > underlying physical system. Like Enumerable doesn't consider
> > > > distribution,
> > > > > > > because it is single-node system, but Hive/Flink are distributed
> > > > system.
> > > > > > > - Haisheng
> > > > > > >
> > > > > > > ------------------------------------------------------------------
> > > > > > > 发件人：Stamatis Zampetakis<za...@gmail.com>
> > > > > > > 日 期：2019年10月18日 14:53:41
> > > > > > > 收件人：<de...@calcite.apache.org>
> > > > > > > 主 题：Re: [DISCUSS] On-demand traitset request
> > > > > > >
> > > > > > > Hi Haisheng,
> > > > > > >
> > > > > > > This is an interesting topic but somehow in my mind I thought that
> > > this
> > > > > > > mechanism is already in place.
> > > > > > >
> > > > > > > When an operator (logical or physical) is created its traitset is
> > > > > > > determined in bottom-up fashion using the create
> > > > > > > static factory method present in almost all operators. In my mind
> > > this
> > > > is
> > > > > > > in some sense the applicability function
> > > > > > > mentioned in [1].
> > > > > > >
> > > > > > > Now during optimization we proceed in top-down manner and we
> > request
> > > > > > > certain traitsets from the operators.
> > > > > > > If it happens and they contain already the requested traits nothing
> > > > needs
> > > > > > > to be done.
> > > > > > >
> > > > > > > In your example when we are about to create the sort-merge join we
> > > can
> > > > > > > check what traitsets are present in the inputs
> > > > > > > and if possible request those. Can you elaborate a bit more why do
> > we
> > > > need
> > > > > > > a new type of metadata?
> > > > > > >
> > > > > > > Anyway if we cannot do it at the moment it makes sense to complete
> > > the
> > > > > > > missing bits since what you are describing
> > > > > > > was already mentioned in the original design of the Volcano
> > optimizer
> > > > [1].
> > > > > > >
> > > > > > > "If a move to be pursued is the exploration of a normal query
> > > > processing
> > > > > > > algorithm such as merge-join, its cost is calculated by the
> > > algorithm's
> > > > > > > cost function. The algorithm's applicability function determines
> > the
> > > > > > > physical properly vectors for the algorithms inputs, and their
> > costs
> > > > and
> > > > > > > optimal plans are found by invoking FindBestPlan for the inputs.
> > For
> > > > some
> > > > > > > binary operators, the actual physical properties of the inputs are
> > > not
> > > > as
> > > > > > > important as the consistency of physical properties among the
> > inputs.
> > > > For
> > > > > > > example, for a sort-based implementation of intersection, i.e., an
> > > > > > > algorithm very similar to merge-join, any sort order of the two
> > > inputs
> > > > will
> > > > > > > suffice as long as the two inputs are sorted in the same way.
> > > > Similarly,
> > > > > > > for a parallel join, any partitioning of join inputs across
> > multiple
> > > > > > > processing nodes is acceptable if both inputs are partitioned using
> > > > > > > Compatible partitioning rules. For these cases, the search engine
> > > > permits
> > > > > > > the optimizer implementor to specify a number of physical property
> > > > vectors
> > > > > > > to be tried. For example, for the intersection of two inputs R and
> > S
> > > > with
> > > > > > > attributes A, B, and C where R is sorted on (A,B,C) and S is sorted
> > > on
> > > > > > > (B,A,C), both these sort orders can be specified by the optimizer
> > > > > > > implementor and will be optimized by the generated optimizer, while
> > > > other
> > > > > > > possible sort orders, e.g., (C,B,A), will be ignored. " [1]
> > > > > > >
> > > > > > > Best,
> > > > > > > Stamatis
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > >
> > >
> > https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
> > > > > > >
> > > > > > > On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <
> > > h.yuan@alibaba-inc.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > TL;DR
> > > > > > > > Both top-down physical TraitSet request and bottom-up TraitSet
> > > > > > > > derivation have their strongth and weakness, we propose
> > > > > > > > on-demand TraitSet request to combine the above two, to reduce
> > > > > > > > the number of plan alternatives that are genereated, especially
> > > > > > > > in distributed system.
> > > > > > > >
> > > > > > > > e.g.
> > > > > > > > select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
> > > > > > > >
> > > > > > > > In non-distributed system, we can generate a sort merge join,
> > > > > > > > requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.
> > > > > > > > But if foo happens to be sorted by f3,f2,f1, we may miss the
> > > > > > > > chance of making use of the delivered ordering of foo. Because
> > > > > > > > if we require bar to be sorted by b3,b2,b1, we don't need to
> > > > > > > > sort on foo anymore. There are so many choices, n!, not even
> > > > > > > > considering asc/desc and null direction. We can't request all
> > > > > > > > the possible traitsets in top-down way, and can't derive all the
> > > > > > > > possible traitsets in bottom-up way either.
> > > > > > > >
> > > > > > > > We propose on-demand traitset request by adding a new type
> > > > > > > > of metadata DerivedTraitSets into the built-in metadata system.
> > > > > > > >
> > > > > > > > List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)
> > > > > > > >
> > > > > > > > In this metadata, every operator returns several possbile
> > traitsets
> > > > > > > > that may be derived from this operator.
> > > > > > > >
> > > > > > > > Using above query as an example, the tablescan on foo should
> > > > > > > > return traiset with collation on f3, f2, f1.
> > > > > > > >
> > > > > > > > In physical implementation rules, e.g. the SortMergeJoinRule,
> > > > > > > > it gets possible traitsets from both child operators, uses the
> > join
> > > > > > > > keys to eliminate useless traitsets, leaves out usefull traitsets,
> > > > > > > > and requests corresponding traitset on the other child.
> > > > > > > >
> > > > > > > > This relies on the feature of AbstractConverter, which is turned
> > > > > > > > off by default, due to performance issue [1].
> > > > > > > >
> > > > > > > > Thoughts?
> > > > > > > >
> > > > > > > > [1] https://issues.apache.org/jira/browse/CALCITE-2970
> > > > > > > >
> > > > > > > > Haisheng
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > >
> > > >
> > >
> >
> >

Re: Re: [DISCUSS] On-demand traitset request

Posted by Stamatis Zampetakis <za...@gmail.com>.

Overall, I agree that better encapsulation of propagation and derivation of
traits would be beneficial for our system.

Regarding the API proposed by Haisheng, I have to think a bit more on it.
At first glance, adding such methods directly in the RelNode API does not
appear an ideal solution since I don't see how easily it can be extended to
support other kinds of traits.

Best,
Stamatis

On Mon, Oct 21, 2019 at 7:31 AM Haisheng Yuan <h....@alibaba-inc.com>
wrote:

> To Stamatis,
> Not exactly. My initial thought was giving the physical operator the
> abiity to customize and fully control physical property derivation
> strategy, thus can further help the purpose driven trait request. But since
> we agree to think more high-level API to support on-demand traitset
> request, I will illustrate what API is expected from implentator's
> perspective.
>
> Jingfeng gave us basic steps on how the plan might be generated using
> top-down purpose driven only manner, I think differently with the first
> several steps.
>
> SELECT DISTINCT c, b FROM
>   ( SELECT R.c c, S.b b FROM R, S
>         WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
>
> Aggregate . (c, b)
>     +--- MergeJoin . (a, b, c)
>                 |--- TableScan on R
>                 +-- TableScan on S
>
> 1. Aggreate require collation (c,b) from its child, not permutation.
> 2. MergeJoin's parent require (c,b), it has 2 options. Pass it down, or
> ignore it.
>    a) Pass down. it has join condition on (a,b,c), the required columns
> can be coverd by join condition columns, so MergeJoin will try to deliver
> (c,b,a), and both children must exact match. Then we will have sort on both
> children of MergeJoin.
>    b) Ignore it. Require its first child collation on (a,b,c), but
> matching type is subset. R delivers (c,b,a). Then using the first child's
> derived collation trait to require its second child to exact match. Thus we
> have a sort on S, and a sort on top of MergeJoin.
>
> Both plan might be good or bad. If R, S are large, but the join result is
> small, plan b) might be better, otherwise plan a) might be better.
>
> Anyway, I hope the physical operators can have full control the physical
> properties requests and derivation, in physical operator class itself, not
> rules, not other places.
>
> Per our experience, we have spent too much time on writing code for
> dealing with all kinds of property requirement and derivation. But in fact,
> life should be easier. I would like to the physical operator provides the
> following API, and the 3rd party implementator just need to
> override/implement them, no more need to be taken care.
>
> 1. void setDistributionRequests(int numReq)
> Each operator can specify how many optimzation requests on some trait it
> want to do. e.g. HashJoin may request the following distribution on both
> children:
>  - (hash distribution on key1, hash distribution on key1)
>  - (hash distribution on key2, hash distribution on key2)
>  - (hash distribution on all keys, hash distribution on all keys)
>  - (Any, Broadcast)
>  - (Gather, Gather)
>
> 2. RelDistribution requiredDistribution(RelDistribution required, int
> child) //same for collation
> Given the required distribution from parent operator, returns the required
> distribution for its nth child.
>
> 3. RelDistribution derivedDistribution() //same for collation
> Derive the distribution of the operator itelf from child operators.
>
> 4. MatchType distributionMatchType(int child)  //same for collation
> Returns the distribution match type for its nth child, how does it match
> the other children.
> Similar with Jinfeng's point, I think there should be 3 types of matching:
> exact, satisfy, subset.
> e.g.
> R is distributed by (a), S is distributed by (a,b)
> select * from R join S using a,b,c
> If we have plan
> HashJoin
>     |-- TableScan on R
>     +-- TableScan on S
> We may require the match type on S to be satisfy. (a,b) satisfies required
> distribution (a,b,c).
> Fot the outer child R, we require it to be exact match with inner.
>
> 5. ExecOrder getExecOrder()
> Returns how the operator's children is executed, left to right, or right
> to left. Typically, hash join is right to left. We might use this as the
> optimization order. To make sure we have correct plans, we have to optimize
> child and enforce properties in the order that is specific to the physical
> operator.
> All the other dirty work should be done by the optimization engine, but
> not through rules, I believe. However, I havn't got any clear plan on how
> to achieve it inside the engine.
>
> Haisheng
>
> ------------------------------------------------------------------
> 发件人：Jacques Nadeau<ja...@apache.org>
> 日 期：2019年10月21日 11:04:19
> 收件人：<de...@calcite.apache.org>
> 主 题：Re: [DISCUSS] On-demand traitset request
>
> Definitely agree that this has been a long time missing. I've been
> challenged by this absence since before Calcite was Calcite. I also
> remember the trials and tribulations around this that Jinfeng references
> above.
>
> In general, I think the first thing one might want to before actually doing
> this is to make trait derivation internally defined based on the impact
> that a rel node has on traits. I've always found the externally provided
> rel traits to be problematic and a potential place for hidden bugs (row
> type has the same problem) . It means that trait derivation of a relnode is
> based on the rules that do transformation as opposed to the "physical"
> impact of the relnode. (It also leads to derivation behavior for a relnode
> being scattered in many different rules.) If moved to the rel node, it also
> provides a second benefit, once you encapsulate this propagation logic, you
> could also expose this as a trait derivation function that the planner
> could use to seek out derivation paths.
>
> At Dremio we toyed last year with the idea of adding a heuristic cycle on
> top of the existing volano planner and relset state. In this model a
> RelNode would have two additional methods: it would expose a trait
> propagation function (as described above) and optionally expose one or more
> specific traits this node desired. When the planner arrived at a
> conclusion, you'd run the heuristic cycle to further propagate desired
> traits (if possible) and then restart the planning cycle based on any new
> transformations done during the heuristic stage. You'd then repeat this
> volcano/trait prop cycle until you arrive at a "completed" state.
>
> We never actually got to implementation but I'm super supportive of someone
> picking this up.
>
>
>
> On Sat, Oct 19, 2019 at 12:25 AM Stamatis Zampetakis <za...@gmail.com>
> wrote:
>
> > Thanks all for the very interesting usecases and helpful examples.
> >
> > I would like to stay a bit on the fact that logical operators do not have
> > physical traits. Calcite's logical operators do have at least one
> physical
> > trait which is Convention.NONE. Other logical operators such as:
> >
> > LogicalTableScan [1]
> > LogicalFilter [2]
> > LogicalProject [3]
> > LogicalWindow [4]
> >
> > have additional traits regarding collation and distribution. There is
> > already some sort of trait derivation so to some extend it is possible to
> > check the traitset of the child (logical) operator before requesting some
> > other traitset when creating the parent (physical).
> >
> > I see that this mechanism of adding explicitly traits to logical
> operators
> > may be confusing and may also lead to planning problems. Replacing it by
> > metadata might be a good idea and it is closer to the idea of
> > "applicability function" mentioned in the Volcano paper. Assuming that we
> > follow this approach I would assume that the traitset of logical
> operators
> > from now on should be always empty.
> >
> > Is this what you have in mind Haisheng?
> >
> > Best,
> > Stamatis
> >
> > [1]
> >
> >
> https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalTableScan.java#L95
> > [2]
> >
> >
> https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalFilter.java#L105
> > [3]
> >
> >
> https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalProject.java#L104
> > [4]
> >
> >
> https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalWindow.java#L95
> >
> > On Sat, Oct 19, 2019 at 7:39 AM Xiening Dai <xn...@gmail.com> wrote:
> >
> > > Thanks for the sharing. I like the way you model this problem, Jinfeng.
> > >
> > > There’s one minor issue with your example. Let say if R and S doesn’t
> > have
> > > sorting properties at all. In your case, we would end up adding
> enforcers
> > > for LHS and RHS to get collation (a, b, c). Then we would need another
> > > enforcer to get collation (b, c). This is a sub optimal plan as we
> could
> > > have use (b, c, a) for join.
> > >
> > > I think in step #2, the join operator would need to take the agg trait
> > > requirement into account. Then it would have two options -
> > >
> > > 1) require *exact/super* match of (b, c, a) or (c, b, a); this is to
> > > guarantee the join output would deliver the collation agg needs.
> > > 2) require permutation match of (a, b, c); in such case, an enforcer
> > might
> > > be needed for aggregation.
> > >
> > > Eventually the cost model decides who is the winner.
> > >
> > > There’s a fundamental difference between your model and Haisheng’s
> > > proposal. In Haisheng’s case, a rel node not only looks at its parent’s
> > > requirement, but also tries to get the potential traits its input could
> > > deliver. It would try to align them to eliminate unnecessary
> > alternatives.
> > >
> > > In above example, assuming R is (b, c, a) and S is (a, b, c), to
> > implement
> > > option 1), we would generate two alternatives -
> > >
> > > MergeJoin (b, c, a)
> > > TableScan R
> > > Sort(b, c, a)
> > > TableScan S
> > >
> > > MergeJoin(c, b, a)
> > > Sort(c, b, a)
> > > TableScan R
> > > Sort(c, b, a)
> > > TableScan S
> > >
> > > But if we look at the input traits and has the insight that R already
> > > delivers (b, c, a), we could decide to require (b, c, a) only and avoid
> > > generating the 2nd plan, which is definitely worse, and reduce the
> search
> > > space.
> > >
> > >
> > > > On Oct 18, 2019, at 4:57 PM, Jinfeng Ni <jn...@apache.org> wrote:
> > > >
> > > > A little bit of history. In Drill, when we first implemented
> > > > Distribution trait's definition, we allows both exact match and
> > > > partial match in satisfy() method. This works fine for single-input
> > > > operator such aggregation, however it leads to incorrect plan for
> join
> > > > query, i.e LHS shuffle with (a, b), RHS shuffle with (a) . At that
> > > > time, we removed partial match, and use exact match only. Yet this
> > > > changes leads to unnecessary additional exchange. To mitigate this
> > > > problem, in join physical operator, for a join key (a, b, c), we
> > > > enumerate different distribution requests, yet this lead to more
> space
> > > > to explore and significantly increase planning time (which is
> probably
> > > > what Haisheng also experienced). When I look back, I feel probably
> > > > what we miss is the "coordination" step in the join operator, because
> > > > if we relax the requirement of satisfy(), for multi-input operators,
> > > > we have to enforce some "coordination", to make sure multiple input's
> > > > trait could work together properly.
> > > >
> > > >
> > > >
> > > > On Fri, Oct 18, 2019 at 4:38 PM Jinfeng Ni <jn...@apache.org> wrote:
> > > >>
> > > >> This is an interesting topic. Thanks for bringing up this issue.
> > > >>
> > > >> My understanding of Volcano planner is it works in a top-down search
> > > >> mode (the parent asks for certain trait of its child), while the
> trait
> > > >> propagates in a bottom-up way, as Stamatis explained.
> > > >>
> > > >> IMHO, the issue comes down to the definition of RelTrait, how to
> > > >> determine if a trait A could satisfy a request asking for trait B,
> > > >> that is, how RelTrait.satisfies() method is implemented.
> > > >>
> > > >> Let's first clarify different situations, using collation as
> example.
> > > >> 1) The collation is requested by query's outmost ORDER BY clause.
> > > >> - The generated plan has to have "exact match", i.e same collation
> > > >> (same column sequence), or "super match" .
> > > >> exact match: (a, b) satisfy (a, b)
> > > >> super match: (a, b, c) satisfy (a, b)
> > > >>
> > > >> 2) The collation is requested by operand with single input, such as
> > > >> sort-based Aggregation.
> > > >> - In such case, a "permutation match" is sufficient.
> > > >> For instance, for Aggregation (b,c), input with collation (c, b)
> > > >> could satisfy the requirement.
> > > >> permutation match: (b, c) satisfy (c, b). (c, b) satisfy (c,
> > b)
> > > >> permutation match: (b, c, a) satisfy (c, b). (c, b, a) satisfy
> > (c,
> > > b)
> > > >>
> > > >> 3) The collation is requested by operand with >= 2 inputs, such as
> > > >> sort-based MergeJoin.
> > > >> - A permutation match is sufficient for each input
> > > >> - MergeJoin has to do coordination, after input's trait propagates
> > > >> upwards. In other words, ensure both inputs's permutation match are
> > > >> actually same sequence. Otherwise, enforcer could be inserted upon
> > > >> each input, and the planner generates two plans and let the cost
> > > >> decide.
> > > >>
> > > >> For the first case, this is how today's RelCollation's satisfy()
> > > >> method is implemented.
> > > >>
> > > >> For the second / third cases, use Haisheng's example,
> > > >>
> > > >> SELECT DISTINCT c, b FROM
> > > >> ( SELECT R.c c, S.b b FROM R, S
> > > >> WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> > > >>
> > > >> Aggregate . (c, b)
> > > >> +--- MergeJoin . (a, b, c)
> > > >> |--- TableScan on R
> > > >> +--- TableScan on S
> > > >>
> > > >> Here is the steps that might take place in the planner:
> > > >>
> > > >> 1) Aggregate request permutation match collation (c, b)
> > > >> 2) MergeJoin request a permutation match of (a, b,c) on both it's
> > input
> > > >> 3) R respond with collation (c, b, a), which satisfy MergeJoin's LHS
> > > requirement
> > > >> 4) S respond with collation (b, c, a), which satisfy MergeJoins' RHS
> > > requirement
> > > >> 5) MergeJoin do a coordination o LHS, RHS, and generate two possible
> > > plans
> > > >> MJ1: Insert a sort of (c, b, a) on RHS. This MJ operator now has
> > > >> collation of (c, b, a)
> > > >> MJ2: Insert a sort of (b, c, a) on LHS. This MJ operator now has
> > > >> collation of (b, c, a)
> > > >> 6) MJ1 and MJ2 could both satisfy permutation match request in step
> > > >> 1, leading to two possible plans:
> > > >> Agg1: with input of MJ1
> > > >> Agg2: with input of MJ2
> > > >> 7) planner chooses a best plan based on cost of Agg1 and Agg2.
> > > >>
> > > >> I should point that the enforcer sort inserted in step 5 could help
> > > >> remove redundant sort in its input, if the input's collation is
> > > >> obtained from sort, by invoking Calcite's SortRemove Rule.
> > > >>
> > > >> The above only considers the column sequence. The DESC/ASC, NULL
> > > >> FIRST/LAST will add more complexity, but we probably use similar
> idea.
> > > >>
> > > >> In summary, we need :
> > > >> 1) redefine collation trait's satisfy() policy, exact match, super
> > > >> match, permutation match,
> > > >> 2) different physical operator applies different trait matching
> > > >> policy, depending on operator's # of inputs, and algorithm
> > > >> implementation.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Fri, Oct 18, 2019 at 2:51 PM Haisheng Yuan <
> h.yuan@alibaba-inc.com
> > >
> > > wrote:
> > > >>>
> > > >>> Hi Stamatis,
> > > >>>
> > > >>> Thanks for your comment. I think my example didn't make it clear.
> > > >>>
> > > >>> When a logical operator is created, it doesn't have any physical,
> > > >>> propertyand it shouldn't have. When a physical operator is created,
> > > >>> e.g. in Enumerable convention, it only creates an intuitive
> traitset
> > > >>> with it, and requests it children the corresponding ones.
> > > >>>
> > > >>> For operators such as Join, Aggregate, Window, which may deliver
> > > >>> multiple different traitsets, when the parent operator is created
> and
> > > >>> request its traitset, it might be good to know what are the
> poosible
> > > >>> traitset that the child operator can deliver. e.g.
> > > >>>
> > > >>> SELECT DISTINCT c, b FROM
> > > >>> ( SELECT R.c c, S.b b FROM R, S
> > > >>> WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> > > >>>
> > > >>> Suppose R is ordered by (c, b, a), and S is ordered by (b, c, a).
> > > >>> Here is the logical plan:
> > > >>> Aggregate
> > > >>> +--- InnerJoin
> > > >>> |--- TableScan on R
> > > >>> +--- TableScan on S
> > > >>>
> > > >>> When we create a physical merge join for the inner join, it may
> just
> > > >>> have collation sorted on a,b,c. Then the aggreate on top of join
> will
> > > >>> request another sort on c,b, thus we miss the best plan. What we
> > > >>> can do is requesting all the order combinations, which is n!, like
> > > >>> how the Values operator does. But that is too much.
> > > >>>
> > > >>> If we can provide an approach that can minimize the possiple
> traitset
> > > >>> that the child operator may deliver, we can reduce the chance of
> > > missing
> > > >>> good plans. For the above query, the Aggregate operator can derive
> > > >>> possible traitsets that its child operator join can deliver, in
> which
> > > case,
> > > >>> the possiple traitsets of join is
> > > >>> 1. collation on (a,b,c) based on join condition,
> > > >>> 2. collation on (c,b,a) based on left child,
> > > >>> 3. collation on (b,c,a) based on right child
> > > >>> So we can request Aggregate sorted by (c,b) and Join sorted by
> > (c,b,a).
> > > >>> The number of traiset requests and plan alternatives can be
> reduced.
> > > >>> The DerivedTraitSets can be used to derive the possible traitsets
> > from
> > > >>> Join, and pass through Project, Filter etc...
> > > >>>
> > > >>> This is just an example of non-distributed system, for distributed
> > > system,
> > > >>> it can save much more by considering the possible distribution
> > > delivered
> > > >>> by child operators.
> > > >>>
> > > >>> One thing that concerns me is it highly relies on the traiset
> system
> > > of the
> > > >>> underlying physical system. Like Enumerable doesn't consider
> > > distribution,
> > > >>> because it is single-node system, but Hive/Flink are distributed
> > > system.
> > > >>> - Haisheng
> > > >>>
> > > >>> ------------------------------------------------------------------
> > > >>> 发件人：Stamatis Zampetakis<za...@gmail.com>
> > > >>> 日 期：2019年10月18日 14:53:41
> > > >>> 收件人：<de...@calcite.apache.org>
> > > >>> 主 题：Re: [DISCUSS] On-demand traitset request
> > > >>>
> > > >>> Hi Haisheng,
> > > >>>
> > > >>> This is an interesting topic but somehow in my mind I thought that
> > this
> > > >>> mechanism is already in place.
> > > >>>
> > > >>> When an operator (logical or physical) is created its traitset is
> > > >>> determined in bottom-up fashion using the create
> > > >>> static factory method present in almost all operators. In my mind
> > this
> > > is
> > > >>> in some sense the applicability function
> > > >>> mentioned in [1].
> > > >>>
> > > >>> Now during optimization we proceed in top-down manner and we
> request
> > > >>> certain traitsets from the operators.
> > > >>> If it happens and they contain already the requested traits nothing
> > > needs
> > > >>> to be done.
> > > >>>
> > > >>> In your example when we are about to create the sort-merge join we
> > can
> > > >>> check what traitsets are present in the inputs
> > > >>> and if possible request those. Can you elaborate a bit more why do
> we
> > > need
> > > >>> a new type of metadata?
> > > >>>
> > > >>> Anyway if we cannot do it at the moment it makes sense to complete
> > the
> > > >>> missing bits since what you are describing
> > > >>> was already mentioned in the original design of the Volcano
> optimizer
> > > [1].
> > > >>>
> > > >>> "If a move to be pursued is the exploration of a normal query
> > > processing
> > > >>> algorithm such as merge-join, its cost is calculated by the
> > algorithm's
> > > >>> cost function. The algorithm's applicability function determines
> the
> > > >>> physical properly vectors for the algorithms inputs, and their
> costs
> > > and
> > > >>> optimal plans are found by invoking FindBestPlan for the inputs.
> For
> > > some
> > > >>> binary operators, the actual physical properties of the inputs are
> > not
> > > as
> > > >>> important as the consistency of physical properties among the
> inputs.
> > > For
> > > >>> example, for a sort-based implementation of intersection, i.e., an
> > > >>> algorithm very similar to merge-join, any sort order of the two
> > inputs
> > > will
> > > >>> suffice as long as the two inputs are sorted in the same way.
> > > Similarly,
> > > >>> for a parallel join, any partitioning of join inputs across
> multiple
> > > >>> processing nodes is acceptable if both inputs are partitioned using
> > > >>> Compatible partitioning rules. For these cases, the search engine
> > > permits
> > > >>> the optimizer implementor to specify a number of physical property
> > > vectors
> > > >>> to be tried. For example, for the intersection of two inputs R and
> S
> > > with
> > > >>> attributes A, B, and C where R is sorted on (A,B,C) and S is sorted
> > on
> > > >>> (B,A,C), both these sort orders can be specified by the optimizer
> > > >>> implementor and will be optimized by the generated optimizer, while
> > > other
> > > >>> possible sort orders, e.g., (C,B,A), will be ignored. " [1]
> > > >>>
> > > >>> Best,
> > > >>> Stamatis
> > > >>>
> > > >>> [1]
> > > >>>
> > >
> >
> https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
> > > >>>
> > > >>> On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <
> > h.yuan@alibaba-inc.com>
> > > >>> wrote:
> > > >>>
> > > >>>> TL;DR
> > > >>>> Both top-down physical TraitSet request and bottom-up TraitSet
> > > >>>> derivation have their strongth and weakness, we propose
> > > >>>> on-demand TraitSet request to combine the above two, to reduce
> > > >>>> the number of plan alternatives that are genereated, especially
> > > >>>> in distributed system.
> > > >>>>
> > > >>>> e.g.
> > > >>>> select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
> > > >>>>
> > > >>>> In non-distributed system, we can generate a sort merge join,
> > > >>>> requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.
> > > >>>> But if foo happens to be sorted by f3,f2,f1, we may miss the
> > > >>>> chance of making use of the delivered ordering of foo. Because
> > > >>>> if we require bar to be sorted by b3,b2,b1, we don't need to
> > > >>>> sort on foo anymore. There are so many choices, n!, not even
> > > >>>> considering asc/desc and null direction. We can't request all
> > > >>>> the possible traitsets in top-down way, and can't derive all the
> > > >>>> possible traitsets in bottom-up way either.
> > > >>>>
> > > >>>> We propose on-demand traitset request by adding a new type
> > > >>>> of metadata DerivedTraitSets into the built-in metadata system.
> > > >>>>
> > > >>>> List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)
> > > >>>>
> > > >>>> In this metadata, every operator returns several possbile
> traitsets
> > > >>>> that may be derived from this operator.
> > > >>>>
> > > >>>> Using above query as an example, the tablescan on foo should
> > > >>>> return traiset with collation on f3, f2, f1.
> > > >>>>
> > > >>>> In physical implementation rules, e.g. the SortMergeJoinRule,
> > > >>>> it gets possible traitsets from both child operators, uses the
> join
> > > >>>> keys to eliminate useless traitsets, leaves out usefull traitsets,
> > > >>>> and requests corresponding traitset on the other child.
> > > >>>>
> > > >>>> This relies on the feature of AbstractConverter, which is turned
> > > >>>> off by default, due to performance issue [1].
> > > >>>>
> > > >>>> Thoughts?
> > > >>>>
> > > >>>> [1] https://issues.apache.org/jira/browse/CALCITE-2970
> > > >>>>
> > > >>>> Haisheng
> > > >>>>
> > > >>>>
> > > >>>
> > >
> > >
> >
>
>

Re: Re: [DISCUSS] On-demand traitset request

Posted by Haisheng Yuan <h....@alibaba-inc.com>.

To Stamatis,
Not exactly. My initial thought was giving the physical operator the abiity to customize and fully control physical property derivation strategy, thus can further help the purpose driven trait request. But since we agree to think more high-level API to support on-demand traitset request, I will illustrate what API is expected from implentator's perspective.

Jingfeng gave us basic steps on how the plan might be generated using top-down purpose driven only manner, I think differently with the first several steps.

SELECT DISTINCT c, b FROM
  ( SELECT R.c c, S.b b FROM R, S
        WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;

Aggregate . (c, b)
    +--- MergeJoin . (a, b, c)
                |--- TableScan on R
                +-- TableScan on S

1. Aggreate require collation (c,b) from its child, not permutation.
2. MergeJoin's parent require (c,b), it has 2 options. Pass it down, or ignore it.
   a) Pass down. it has join condition on (a,b,c), the required columns can be coverd by join condition columns, so MergeJoin will try to deliver (c,b,a), and both children must exact match. Then we will have sort on both children of MergeJoin.
   b) Ignore it. Require its first child collation on (a,b,c), but matching type is subset. R delivers (c,b,a). Then using the first child's derived collation trait to require its second child to exact match. Thus we have a sort on S, and a sort on top of MergeJoin.

Both plan might be good or bad. If R, S are large, but the join result is small, plan b) might be better, otherwise plan a) might be better. 

Anyway, I hope the physical operators can have full control the physical properties requests and derivation, in physical operator class itself, not rules, not other places.

Per our experience, we have spent too much time on writing code for dealing with all kinds of property requirement and derivation. But in fact, life should be easier. I would like to the physical operator provides the following API, and the 3rd party implementator just need to override/implement them, no more need to be taken care.

1. void setDistributionRequests(int numReq)
Each operator can specify how many optimzation requests on some trait it want to do. e.g. HashJoin may request the following distribution on both children:
 - (hash distribution on key1, hash distribution on key1)
 - (hash distribution on key2, hash distribution on key2)
 - (hash distribution on all keys, hash distribution on all keys)
 - (Any, Broadcast)
 - (Gather, Gather)

2. RelDistribution requiredDistribution(RelDistribution required, int child) //same for collation
Given the required distribution from parent operator, returns the required distribution for its nth child.

3. RelDistribution derivedDistribution() //same for collation
Derive the distribution of the operator itelf from child operators.

4. MatchType distributionMatchType(int child)  //same for collation
Returns the distribution match type for its nth child, how does it match the other children. 
Similar with Jinfeng's point, I think there should be 3 types of matching: exact, satisfy, subset.
e.g. 
R is distributed by (a), S is distributed by (a,b)
select * from R join S using a,b,c
If we have plan
HashJoin
    |-- TableScan on R
    +-- TableScan on S
We may require the match type on S to be satisfy. (a,b) satisfies required distribution (a,b,c). 
Fot the outer child R, we require it to be exact match with inner. 

5. ExecOrder getExecOrder()
Returns how the operator's children is executed, left to right, or right to left. Typically, hash join is right to left. We might use this as the optimization order. To make sure we have correct plans, we have to optimize child and enforce properties in the order that is specific to the physical operator.
All the other dirty work should be done by the optimization engine, but not through rules, I believe. However, I havn't got any clear plan on how to achieve it inside the engine. 

Haisheng

------------------------------------------------------------------
发件人：Jacques Nadeau<ja...@apache.org>
日　期：2019年10月21日 11:04:19
收件人：<de...@calcite.apache.org>
主　题：Re: [DISCUSS] On-demand traitset request

Definitely agree that this has been a long time missing. I've been
challenged by this absence since before Calcite was Calcite. I also
remember the trials and tribulations around this that Jinfeng references
above.

In general, I think the first thing one might want to before actually doing
this is to make trait derivation internally defined based on the impact
that a rel node has on traits. I've always found the externally provided
rel traits to be problematic and a potential place for hidden bugs (row
type has the same problem) . It means that trait derivation of a relnode is
based on the rules that do transformation as opposed to the "physical"
impact of the relnode. (It also leads to derivation behavior for a relnode
being scattered in many different rules.) If moved to the rel node, it also
provides a second benefit, once you encapsulate this propagation logic, you
could also expose this as a trait derivation function that the planner
could use to seek out derivation paths.

At Dremio we toyed last year with the idea of adding a heuristic cycle on
top of the existing volano planner and relset state. In this model a
RelNode would have two additional methods: it would expose a trait
propagation function (as described above) and optionally expose one or more
specific traits this node desired. When the planner arrived at a
conclusion, you'd run the heuristic cycle to further propagate desired
traits (if possible) and then restart the planning cycle based on any new
transformations done during the heuristic stage. You'd then repeat this
volcano/trait prop cycle until you arrive at a "completed" state.

We never actually got to implementation but I'm super supportive of someone
picking this up.



On Sat, Oct 19, 2019 at 12:25 AM Stamatis Zampetakis <za...@gmail.com>
wrote:

> Thanks all for the very interesting usecases and helpful examples.
>
> I would like to stay a bit on the fact that logical operators do not have
> physical traits. Calcite's logical operators do have at least one physical
> trait which is Convention.NONE. Other logical operators such as:
>
> LogicalTableScan [1]
> LogicalFilter [2]
> LogicalProject [3]
> LogicalWindow [4]
>
> have additional traits regarding collation and distribution. There is
> already some sort of trait derivation so to some extend it is possible to
> check the traitset of the child (logical) operator before requesting some
> other traitset when creating the parent (physical).
>
> I see that this mechanism of adding explicitly traits to logical operators
> may be confusing and may also lead to planning problems. Replacing it by
> metadata might be a good idea and it is closer to the idea of
> "applicability function" mentioned in the Volcano paper. Assuming that we
> follow this approach I would assume that the traitset of logical operators
> from now on should be always empty.
>
> Is this what you have in mind Haisheng?
>
> Best,
> Stamatis
>
> [1]
>
> https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalTableScan.java#L95
> [2]
>
> https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalFilter.java#L105
> [3]
>
> https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalProject.java#L104
> [4]
>
> https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalWindow.java#L95
>
> On Sat, Oct 19, 2019 at 7:39 AM Xiening Dai <xn...@gmail.com> wrote:
>
> > Thanks for the sharing. I like the way you model this problem, Jinfeng.
> >
> > There’s one minor issue with your example. Let say if R and S doesn’t
> have
> > sorting properties at all. In your case, we would end up adding enforcers
> > for LHS and RHS to get collation (a, b, c). Then we would need another
> > enforcer to get collation (b, c). This is a sub optimal plan as we could
> > have use (b, c, a) for join.
> >
> > I think in step #2, the join operator would need to take the agg trait
> > requirement into account. Then it would have two options -
> >
> > 1) require *exact/super* match of (b, c, a) or (c, b, a); this is to
> > guarantee the join output would deliver the collation agg needs.
> > 2) require permutation match of (a, b, c); in such case, an enforcer
> might
> > be needed for aggregation.
> >
> > Eventually the cost model decides who is the winner.
> >
> > There’s a fundamental difference between your model and Haisheng’s
> > proposal. In Haisheng’s case, a rel node not only looks at its parent’s
> > requirement, but also tries to get the potential traits its input could
> > deliver. It would try to align them to eliminate unnecessary
> alternatives.
> >
> > In above example, assuming R is (b, c, a) and S is (a, b, c), to
> implement
> > option 1), we would generate two alternatives -
> >
> > MergeJoin (b, c, a)
> > TableScan R
> > Sort(b, c, a)
> > TableScan S
> >
> > MergeJoin(c, b, a)
> > Sort(c, b, a)
> > TableScan R
> > Sort(c, b, a)
> > TableScan S
> >
> > But if we look at the input traits and has the insight that R already
> > delivers (b, c, a), we could decide to require (b, c, a) only and avoid
> > generating the 2nd plan, which is definitely worse, and reduce the search
> > space.
> >
> >
> > > On Oct 18, 2019, at 4:57 PM, Jinfeng Ni <jn...@apache.org> wrote:
> > >
> > > A little bit of history. In Drill, when we first implemented
> > > Distribution trait's definition, we allows both exact match and
> > > partial match in satisfy() method. This works fine for single-input
> > > operator such aggregation, however it leads to incorrect plan for join
> > > query, i.e LHS shuffle with (a, b), RHS shuffle with (a) . At that
> > > time, we removed partial match, and use exact match only. Yet this
> > > changes leads to unnecessary additional exchange. To mitigate this
> > > problem, in join physical operator, for a join key (a, b, c), we
> > > enumerate different distribution requests, yet this lead to more space
> > > to explore and significantly increase planning time (which is probably
> > > what Haisheng also experienced). When I look back, I feel probably
> > > what we miss is the "coordination" step in the join operator, because
> > > if we relax the requirement of satisfy(), for multi-input operators,
> > > we have to enforce some "coordination", to make sure multiple input's
> > > trait could work together properly.
> > >
> > >
> > >
> > > On Fri, Oct 18, 2019 at 4:38 PM Jinfeng Ni <jn...@apache.org> wrote:
> > >>
> > >> This is an interesting topic. Thanks for bringing up this issue.
> > >>
> > >> My understanding of Volcano planner is it works in a top-down search
> > >> mode (the parent asks for certain trait of its child), while the trait
> > >> propagates in a bottom-up way, as Stamatis explained.
> > >>
> > >> IMHO, the issue comes down to the definition of RelTrait, how to
> > >> determine if a trait A could satisfy a request asking for trait B,
> > >> that is, how RelTrait.satisfies() method is implemented.
> > >>
> > >> Let's first clarify different situations, using collation as example.
> > >> 1) The collation is requested by query's outmost ORDER BY clause.
> > >> - The generated plan has to have "exact match", i.e same collation
> > >> (same column sequence), or "super match" .
> > >> exact match: (a, b) satisfy (a, b)
> > >> super match: (a, b, c) satisfy (a, b)
> > >>
> > >> 2) The collation is requested by operand with single input, such as
> > >> sort-based Aggregation.
> > >> - In such case, a "permutation match" is sufficient.
> > >> For instance, for Aggregation (b,c), input with collation (c, b)
> > >> could satisfy the requirement.
> > >> permutation match: (b, c) satisfy (c, b). (c, b) satisfy (c,
> b)
> > >> permutation match: (b, c, a) satisfy (c, b). (c, b, a) satisfy
> (c,
> > b)
> > >>
> > >> 3) The collation is requested by operand with >= 2 inputs, such as
> > >> sort-based MergeJoin.
> > >> - A permutation match is sufficient for each input
> > >> - MergeJoin has to do coordination, after input's trait propagates
> > >> upwards. In other words, ensure both inputs's permutation match are
> > >> actually same sequence. Otherwise, enforcer could be inserted upon
> > >> each input, and the planner generates two plans and let the cost
> > >> decide.
> > >>
> > >> For the first case, this is how today's RelCollation's satisfy()
> > >> method is implemented.
> > >>
> > >> For the second / third cases, use Haisheng's example,
> > >>
> > >> SELECT DISTINCT c, b FROM
> > >> ( SELECT R.c c, S.b b FROM R, S
> > >> WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> > >>
> > >> Aggregate . (c, b)
> > >> +--- MergeJoin . (a, b, c)
> > >> |--- TableScan on R
> > >> +--- TableScan on S
> > >>
> > >> Here is the steps that might take place in the planner:
> > >>
> > >> 1) Aggregate request permutation match collation (c, b)
> > >> 2) MergeJoin request a permutation match of (a, b,c) on both it's
> input
> > >> 3) R respond with collation (c, b, a), which satisfy MergeJoin's LHS
> > requirement
> > >> 4) S respond with collation (b, c, a), which satisfy MergeJoins' RHS
> > requirement
> > >> 5) MergeJoin do a coordination o LHS, RHS, and generate two possible
> > plans
> > >> MJ1: Insert a sort of (c, b, a) on RHS. This MJ operator now has
> > >> collation of (c, b, a)
> > >> MJ2: Insert a sort of (b, c, a) on LHS. This MJ operator now has
> > >> collation of (b, c, a)
> > >> 6) MJ1 and MJ2 could both satisfy permutation match request in step
> > >> 1, leading to two possible plans:
> > >> Agg1: with input of MJ1
> > >> Agg2: with input of MJ2
> > >> 7) planner chooses a best plan based on cost of Agg1 and Agg2.
> > >>
> > >> I should point that the enforcer sort inserted in step 5 could help
> > >> remove redundant sort in its input, if the input's collation is
> > >> obtained from sort, by invoking Calcite's SortRemove Rule.
> > >>
> > >> The above only considers the column sequence. The DESC/ASC, NULL
> > >> FIRST/LAST will add more complexity, but we probably use similar idea.
> > >>
> > >> In summary, we need :
> > >> 1) redefine collation trait's satisfy() policy, exact match, super
> > >> match, permutation match,
> > >> 2) different physical operator applies different trait matching
> > >> policy, depending on operator's # of inputs, and algorithm
> > >> implementation.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Fri, Oct 18, 2019 at 2:51 PM Haisheng Yuan <h.yuan@alibaba-inc.com
> >
> > wrote:
> > >>>
> > >>> Hi Stamatis,
> > >>>
> > >>> Thanks for your comment. I think my example didn't make it clear.
> > >>>
> > >>> When a logical operator is created, it doesn't have any physical,
> > >>> propertyand it shouldn't have. When a physical operator is created,
> > >>> e.g. in Enumerable convention, it only creates an intuitive traitset
> > >>> with it, and requests it children the corresponding ones.
> > >>>
> > >>> For operators such as Join, Aggregate, Window, which may deliver
> > >>> multiple different traitsets, when the parent operator is created and
> > >>> request its traitset, it might be good to know what are the poosible
> > >>> traitset that the child operator can deliver. e.g.
> > >>>
> > >>> SELECT DISTINCT c, b FROM
> > >>> ( SELECT R.c c, S.b b FROM R, S
> > >>> WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> > >>>
> > >>> Suppose R is ordered by (c, b, a), and S is ordered by (b, c, a).
> > >>> Here is the logical plan:
> > >>> Aggregate
> > >>> +--- InnerJoin
> > >>> |--- TableScan on R
> > >>> +--- TableScan on S
> > >>>
> > >>> When we create a physical merge join for the inner join, it may just
> > >>> have collation sorted on a,b,c. Then the aggreate on top of join will
> > >>> request another sort on c,b, thus we miss the best plan. What we
> > >>> can do is requesting all the order combinations, which is n!, like
> > >>> how the Values operator does. But that is too much.
> > >>>
> > >>> If we can provide an approach that can minimize the possiple traitset
> > >>> that the child operator may deliver, we can reduce the chance of
> > missing
> > >>> good plans. For the above query, the Aggregate operator can derive
> > >>> possible traitsets that its child operator join can deliver, in which
> > case,
> > >>> the possiple traitsets of join is
> > >>> 1. collation on (a,b,c) based on join condition,
> > >>> 2. collation on (c,b,a) based on left child,
> > >>> 3. collation on (b,c,a) based on right child
> > >>> So we can request Aggregate sorted by (c,b) and Join sorted by
> (c,b,a).
> > >>> The number of traiset requests and plan alternatives can be reduced.
> > >>> The DerivedTraitSets can be used to derive the possible traitsets
> from
> > >>> Join, and pass through Project, Filter etc...
> > >>>
> > >>> This is just an example of non-distributed system, for distributed
> > system,
> > >>> it can save much more by considering the possible distribution
> > delivered
> > >>> by child operators.
> > >>>
> > >>> One thing that concerns me is it highly relies on the traiset system
> > of the
> > >>> underlying physical system. Like Enumerable doesn't consider
> > distribution,
> > >>> because it is single-node system, but Hive/Flink are distributed
> > system.
> > >>> - Haisheng
> > >>>
> > >>> ------------------------------------------------------------------
> > >>> 发件人：Stamatis Zampetakis<za...@gmail.com>
> > >>> 日 期：2019年10月18日 14:53:41
> > >>> 收件人：<de...@calcite.apache.org>
> > >>> 主 题：Re: [DISCUSS] On-demand traitset request
> > >>>
> > >>> Hi Haisheng,
> > >>>
> > >>> This is an interesting topic but somehow in my mind I thought that
> this
> > >>> mechanism is already in place.
> > >>>
> > >>> When an operator (logical or physical) is created its traitset is
> > >>> determined in bottom-up fashion using the create
> > >>> static factory method present in almost all operators. In my mind
> this
> > is
> > >>> in some sense the applicability function
> > >>> mentioned in [1].
> > >>>
> > >>> Now during optimization we proceed in top-down manner and we request
> > >>> certain traitsets from the operators.
> > >>> If it happens and they contain already the requested traits nothing
> > needs
> > >>> to be done.
> > >>>
> > >>> In your example when we are about to create the sort-merge join we
> can
> > >>> check what traitsets are present in the inputs
> > >>> and if possible request those. Can you elaborate a bit more why do we
> > need
> > >>> a new type of metadata?
> > >>>
> > >>> Anyway if we cannot do it at the moment it makes sense to complete
> the
> > >>> missing bits since what you are describing
> > >>> was already mentioned in the original design of the Volcano optimizer
> > [1].
> > >>>
> > >>> "If a move to be pursued is the exploration of a normal query
> > processing
> > >>> algorithm such as merge-join, its cost is calculated by the
> algorithm's
> > >>> cost function. The algorithm's applicability function determines the
> > >>> physical properly vectors for the algorithms inputs, and their costs
> > and
> > >>> optimal plans are found by invoking FindBestPlan for the inputs. For
> > some
> > >>> binary operators, the actual physical properties of the inputs are
> not
> > as
> > >>> important as the consistency of physical properties among the inputs.
> > For
> > >>> example, for a sort-based implementation of intersection, i.e., an
> > >>> algorithm very similar to merge-join, any sort order of the two
> inputs
> > will
> > >>> suffice as long as the two inputs are sorted in the same way.
> > Similarly,
> > >>> for a parallel join, any partitioning of join inputs across multiple
> > >>> processing nodes is acceptable if both inputs are partitioned using
> > >>> Compatible partitioning rules. For these cases, the search engine
> > permits
> > >>> the optimizer implementor to specify a number of physical property
> > vectors
> > >>> to be tried. For example, for the intersection of two inputs R and S
> > with
> > >>> attributes A, B, and C where R is sorted on (A,B,C) and S is sorted
> on
> > >>> (B,A,C), both these sort orders can be specified by the optimizer
> > >>> implementor and will be optimized by the generated optimizer, while
> > other
> > >>> possible sort orders, e.g., (C,B,A), will be ignored. " [1]
> > >>>
> > >>> Best,
> > >>> Stamatis
> > >>>
> > >>> [1]
> > >>>
> >
> https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
> > >>>
> > >>> On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <
> h.yuan@alibaba-inc.com>
> > >>> wrote:
> > >>>
> > >>>> TL;DR
> > >>>> Both top-down physical TraitSet request and bottom-up TraitSet
> > >>>> derivation have their strongth and weakness, we propose
> > >>>> on-demand TraitSet request to combine the above two, to reduce
> > >>>> the number of plan alternatives that are genereated, especially
> > >>>> in distributed system.
> > >>>>
> > >>>> e.g.
> > >>>> select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
> > >>>>
> > >>>> In non-distributed system, we can generate a sort merge join,
> > >>>> requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.
> > >>>> But if foo happens to be sorted by f3,f2,f1, we may miss the
> > >>>> chance of making use of the delivered ordering of foo. Because
> > >>>> if we require bar to be sorted by b3,b2,b1, we don't need to
> > >>>> sort on foo anymore. There are so many choices, n!, not even
> > >>>> considering asc/desc and null direction. We can't request all
> > >>>> the possible traitsets in top-down way, and can't derive all the
> > >>>> possible traitsets in bottom-up way either.
> > >>>>
> > >>>> We propose on-demand traitset request by adding a new type
> > >>>> of metadata DerivedTraitSets into the built-in metadata system.
> > >>>>
> > >>>> List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)
> > >>>>
> > >>>> In this metadata, every operator returns several possbile traitsets
> > >>>> that may be derived from this operator.
> > >>>>
> > >>>> Using above query as an example, the tablescan on foo should
> > >>>> return traiset with collation on f3, f2, f1.
> > >>>>
> > >>>> In physical implementation rules, e.g. the SortMergeJoinRule,
> > >>>> it gets possible traitsets from both child operators, uses the join
> > >>>> keys to eliminate useless traitsets, leaves out usefull traitsets,
> > >>>> and requests corresponding traitset on the other child.
> > >>>>
> > >>>> This relies on the feature of AbstractConverter, which is turned
> > >>>> off by default, due to performance issue [1].
> > >>>>
> > >>>> Thoughts?
> > >>>>
> > >>>> [1] https://issues.apache.org/jira/browse/CALCITE-2970
> > >>>>
> > >>>> Haisheng
> > >>>>
> > >>>>
> > >>>
> >
> >
>

Re: [DISCUSS] On-demand traitset request

Posted by Jacques Nadeau <ja...@apache.org>.

Definitely agree that this has been a long time missing. I've been
challenged by this absence since before Calcite was Calcite. I also
remember the trials and tribulations around this that Jinfeng references
above.

In general, I think the first thing one might want to before actually doing
this is to make trait derivation internally defined based on the impact
that a rel node has on traits. I've always found the externally provided
rel traits to be problematic and a potential place for hidden bugs (row
type has the same problem) . It means that trait derivation of a relnode is
based on the rules that do transformation as opposed to the "physical"
impact of the relnode. (It also leads to derivation behavior for a relnode
being scattered in many different rules.) If moved to the rel node, it also
provides a second benefit, once you encapsulate this propagation logic, you
could also expose this as a trait derivation function that the planner
could use to seek out derivation paths.

At Dremio we toyed last year with the idea of adding a heuristic cycle on
top of the existing volano planner and relset state. In this model a
RelNode would have two additional methods: it would expose a trait
propagation function (as described above) and optionally expose one or more
specific traits this node desired. When the planner arrived at a
conclusion, you'd run the heuristic cycle to further propagate desired
traits (if possible) and then restart the planning cycle based on any new
transformations done during the heuristic stage. You'd then repeat this
volcano/trait prop cycle until you arrive at a "completed" state.

We never actually got to implementation but I'm super supportive of someone
picking this up.



On Sat, Oct 19, 2019 at 12:25 AM Stamatis Zampetakis <za...@gmail.com>
wrote:

> Thanks all for the very interesting usecases and helpful examples.
>
> I would like to stay a bit on the fact that logical operators do not have
> physical traits. Calcite's logical operators do have at least one physical
> trait which is Convention.NONE. Other logical operators such as:
>
> LogicalTableScan [1]
> LogicalFilter [2]
> LogicalProject [3]
> LogicalWindow [4]
>
> have additional traits regarding collation and distribution. There is
> already some sort of trait derivation so to some extend it is possible to
> check the traitset of the child (logical) operator before requesting some
> other traitset when creating the parent (physical).
>
> I see that this mechanism of adding explicitly traits to logical operators
> may be confusing and may also lead to planning problems. Replacing it by
> metadata might be a good idea and it is closer to the idea of
> "applicability function" mentioned in the Volcano paper. Assuming that we
> follow this approach I would assume that the traitset of logical operators
> from now on should be always empty.
>
> Is this what you have in mind Haisheng?
>
> Best,
> Stamatis
>
> [1]
>
> https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalTableScan.java#L95
> [2]
>
> https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalFilter.java#L105
> [3]
>
> https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalProject.java#L104
> [4]
>
> https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalWindow.java#L95
>
> On Sat, Oct 19, 2019 at 7:39 AM Xiening Dai <xn...@gmail.com> wrote:
>
> > Thanks for the sharing. I like the way you model this problem, Jinfeng.
> >
> > There’s one minor issue with your example. Let say if R and S doesn’t
> have
> > sorting properties at all. In your case, we would end up adding enforcers
> > for LHS and RHS to get collation (a, b, c). Then we would need another
> > enforcer to get collation (b, c). This is a sub optimal plan as we could
> > have use (b, c, a) for join.
> >
> > I think in step #2, the join operator would need to take the agg trait
> > requirement into account. Then it would have two options -
> >
> > 1) require *exact/super* match of  (b, c, a) or (c, b, a); this is to
> > guarantee the join output would deliver the collation agg needs.
> > 2) require permutation match of (a, b, c); in such case, an enforcer
> might
> > be needed for aggregation.
> >
> > Eventually the cost model decides who is the winner.
> >
> > There’s a fundamental difference between your model and Haisheng’s
> > proposal. In Haisheng’s case, a rel node not only looks at its parent’s
> > requirement, but also tries to get the potential traits its input could
> > deliver. It would try to align them to eliminate unnecessary
> alternatives.
> >
> > In above example, assuming R is (b, c, a) and S is (a, b, c), to
> implement
> > option 1), we would generate two alternatives -
> >
> > MergeJoin (b, c, a)
> >         TableScan R
> >         Sort(b, c, a)
> >                 TableScan S
> >
> > MergeJoin(c, b, a)
> >         Sort(c, b, a)
> >                 TableScan R
> >         Sort(c, b, a)
> >                 TableScan S
> >
> > But if we look at the input traits and has the insight that R already
> > delivers (b, c, a), we could decide to require (b, c, a) only and avoid
> > generating the 2nd plan, which is definitely worse, and reduce the search
> > space.
> >
> >
> > > On Oct 18, 2019, at 4:57 PM, Jinfeng Ni <jn...@apache.org> wrote:
> > >
> > > A little bit of history.  In Drill,  when we first implemented
> > > Distribution trait's definition,  we allows both exact match and
> > > partial match in satisfy() method. This works fine for single-input
> > > operator such aggregation, however it leads to incorrect plan for join
> > > query, i.e LHS shuffle with (a, b),  RHS shuffle with (a) .  At that
> > > time, we removed partial match, and use exact match only. Yet this
> > > changes leads to unnecessary additional exchange.  To mitigate this
> > > problem, in join physical operator, for a join key (a, b, c),  we
> > > enumerate different distribution requests, yet this lead to more space
> > > to explore and significantly increase planning time (which is probably
> > > what Haisheng also experienced).  When I look back, I feel probably
> > > what we miss is the "coordination" step in the join operator, because
> > > if we relax the requirement of satisfy(), for multi-input operators,
> > > we have to enforce some "coordination", to make sure multiple input's
> > > trait could work together properly.
> > >
> > >
> > >
> > > On Fri, Oct 18, 2019 at 4:38 PM Jinfeng Ni <jn...@apache.org> wrote:
> > >>
> > >> This is an interesting topic. Thanks for bringing up this issue.
> > >>
> > >> My understanding of Volcano planner is it works in a top-down search
> > >> mode (the parent asks for certain trait of its child), while the trait
> > >> propagates in a bottom-up way, as Stamatis explained.
> > >>
> > >> IMHO, the issue comes down to the definition of RelTrait, how to
> > >> determine if a trait A could satisfy a request asking for trait B,
> > >> that is, how RelTrait.satisfies() method is implemented.
> > >>
> > >> Let's first clarify different situations, using collation as example.
> > >> 1) The collation is requested by query's outmost ORDER BY clause.
> > >>   - The generated plan has to have "exact match", i.e same collation
> > >> (same column sequence), or "super match" .
> > >> exact match:   (a, b)  satisfy  (a, b)
> > >> super match:   (a, b, c)  satisfy (a, b)
> > >>
> > >> 2) The collation is requested by operand with single input, such as
> > >> sort-based Aggregation.
> > >>   - In such case, a "permutation match" is sufficient.
> > >> For instance,  for Aggregation (b,c),  input with collation (c, b)
> > >> could satisfy the requirement.
> > >> permutation match:  (b, c) satisfy (c, b).         (c, b) satisfy (c,
> b)
> > >> permutation match:  (b, c, a) satisfy (c, b).     (c, b, a) satisfy
> (c,
> > b)
> > >>
> > >> 3) The collation is requested by operand with >= 2 inputs, such as
> > >> sort-based MergeJoin.
> > >>  - A permutation match is sufficient for each input
> > >>  - MergeJoin has to do coordination, after input's trait propagates
> > >> upwards. In other words,  ensure both inputs's permutation match are
> > >> actually same sequence. Otherwise,  enforcer could be inserted upon
> > >> each input, and the planner generates two plans and let the cost
> > >> decide.
> > >>
> > >> For the first case, this is how today's RelCollation's satisfy()
> > >> method is implemented.
> > >>
> > >> For the second / third cases, use Haisheng's example,
> > >>
> > >> SELECT DISTINCT c, b FROM
> > >>  ( SELECT R.c c, S.b b FROM R, S
> > >>        WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> > >>
> > >> Aggregate . (c, b)
> > >>    +--- MergeJoin . (a, b, c)
> > >>                |--- TableScan on R
> > >>                +--- TableScan on S
> > >>
> > >> Here is the steps that might take place in the planner:
> > >>
> > >> 1) Aggregate request permutation match collation (c, b)
> > >> 2) MergeJoin request a permutation match of (a, b,c) on both it's
> input
> > >> 3) R respond with collation (c, b, a), which satisfy MergeJoin's LHS
> > requirement
> > >> 4) S respond with collation (b, c, a), which satisfy MergeJoins' RHS
> > requirement
> > >> 5) MergeJoin do a coordination o LHS, RHS, and generate two possible
> > plans
> > >>   MJ1:   Insert a sort of (c, b, a) on RHS.  This MJ operator now has
> > >> collation of (c, b, a)
> > >>   MJ2:   Insert a sort of (b, c, a) on LHS.  This MJ operator now has
> > >> collation of (b, c, a)
> > >> 6) MJ1 and MJ2 could both satisfy  permutation match request in step
> > >> 1, leading to two possible plans:
> > >>  Agg1:  with input of MJ1
> > >>  Agg2:  with input of MJ2
> > >> 7) planner chooses a best plan based on cost of Agg1 and Agg2.
> > >>
> > >> I should point that the enforcer sort inserted in step 5 could help
> > >> remove redundant sort in its input, if the input's collation is
> > >> obtained from sort, by invoking Calcite's SortRemove Rule.
> > >>
> > >> The above only considers the column sequence. The DESC/ASC, NULL
> > >> FIRST/LAST will add more complexity, but we probably use similar idea.
> > >>
> > >> In summary,  we need :
> > >>  1) redefine collation trait's satisfy() policy,  exact match, super
> > >> match, permutation match,
> > >>  2) different physical operator applies different trait matching
> > >> policy, depending on operator's # of inputs, and algorithm
> > >> implementation.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Fri, Oct 18, 2019 at 2:51 PM Haisheng Yuan <h.yuan@alibaba-inc.com
> >
> > wrote:
> > >>>
> > >>> Hi Stamatis,
> > >>>
> > >>> Thanks for your comment. I think my example didn't make it clear.
> > >>>
> > >>> When a logical operator is created, it doesn't have any physical,
> > >>> propertyand it shouldn't have. When a physical operator is created,
> > >>> e.g. in Enumerable convention, it only creates an intuitive traitset
> > >>> with it, and requests it children the corresponding ones.
> > >>>
> > >>> For operators such as Join, Aggregate, Window, which may deliver
> > >>> multiple different traitsets, when the parent operator is created and
> > >>> request its traitset, it might be good to know what are the poosible
> > >>> traitset that the child operator can deliver. e.g.
> > >>>
> > >>> SELECT DISTINCT c, b FROM
> > >>>  ( SELECT R.c c, S.b b FROM R, S
> > >>>        WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> > >>>
> > >>> Suppose R is ordered by (c, b, a), and S is ordered by (b, c, a).
> > >>> Here is the logical plan:
> > >>> Aggregate
> > >>>    +--- InnerJoin
> > >>>                |--- TableScan on R
> > >>>                +--- TableScan on S
> > >>>
> > >>> When we create a physical merge join for the inner join, it may just
> > >>> have collation sorted on a,b,c. Then the aggreate on top of join will
> > >>> request another sort on c,b, thus we miss the best plan. What we
> > >>> can do is requesting all the order combinations, which is n!, like
> > >>> how the Values operator does. But that is too much.
> > >>>
> > >>> If we can provide an approach that can minimize the possiple traitset
> > >>> that the child operator may deliver, we can reduce the chance of
> > missing
> > >>> good plans. For the above query, the Aggregate operator can derive
> > >>> possible traitsets that its child operator join can deliver, in which
> > case,
> > >>> the possiple traitsets of join is
> > >>> 1. collation on (a,b,c) based on join condition,
> > >>> 2. collation on (c,b,a) based on left child,
> > >>> 3. collation on (b,c,a) based on right child
> > >>> So we can request Aggregate sorted by (c,b) and Join sorted by
> (c,b,a).
> > >>> The number of traiset requests and plan alternatives can be reduced.
> > >>> The DerivedTraitSets can be used to derive the possible traitsets
> from
> > >>> Join, and pass through Project, Filter etc...
> > >>>
> > >>> This is just an example of non-distributed system, for distributed
> > system,
> > >>> it can save much more by considering the possible distribution
> > delivered
> > >>> by child operators.
> > >>>
> > >>> One thing that concerns me is it highly relies on the traiset system
> > of the
> > >>> underlying physical system. Like Enumerable doesn't consider
> > distribution,
> > >>> because it is single-node system, but Hive/Flink are distributed
> > system.
> > >>> - Haisheng
> > >>>
> > >>> ------------------------------------------------------------------
> > >>> 发件人：Stamatis Zampetakis<za...@gmail.com>
> > >>> 日 期：2019年10月18日 14:53:41
> > >>> 收件人：<de...@calcite.apache.org>
> > >>> 主 题：Re: [DISCUSS] On-demand traitset request
> > >>>
> > >>> Hi Haisheng,
> > >>>
> > >>> This is an interesting topic but somehow in my mind I thought that
> this
> > >>> mechanism is already in place.
> > >>>
> > >>> When an operator (logical or physical) is created its traitset is
> > >>> determined in bottom-up fashion using the create
> > >>> static factory method present in almost all operators. In my mind
> this
> > is
> > >>> in some sense the applicability function
> > >>> mentioned in [1].
> > >>>
> > >>> Now during optimization we proceed in top-down manner and we request
> > >>> certain traitsets from the operators.
> > >>> If it happens and they contain already the requested traits nothing
> > needs
> > >>> to be done.
> > >>>
> > >>> In your example when we are about to create the sort-merge join we
> can
> > >>> check what traitsets are present in the inputs
> > >>> and if possible request those. Can you elaborate a bit more why do we
> > need
> > >>> a new type of metadata?
> > >>>
> > >>> Anyway if we cannot do it at the moment it makes sense to complete
> the
> > >>> missing bits since what you are describing
> > >>> was already mentioned in the original design of the Volcano optimizer
> > [1].
> > >>>
> > >>> "If a move to be pursued is the exploration of a normal query
> > processing
> > >>> algorithm such as merge-join, its cost is calculated by the
> algorithm's
> > >>> cost function. The algorithm's applicability function determines the
> > >>> physical properly vectors for the algorithms inputs, and their costs
> > and
> > >>> optimal plans are found by invoking FindBestPlan for the inputs. For
> > some
> > >>> binary operators, the actual physical properties of the inputs are
> not
> > as
> > >>> important as the consistency of physical properties among the inputs.
> > For
> > >>> example, for a sort-based implementation of intersection, i.e., an
> > >>> algorithm very similar to merge-join, any sort order of the two
> inputs
> > will
> > >>> suffice as long as the two inputs are sorted in the same way.
> > Similarly,
> > >>> for a parallel join, any partitioning of join inputs across multiple
> > >>> processing nodes is acceptable if both inputs are partitioned using
> > >>> Compatible partitioning rules. For these cases, the search engine
> > permits
> > >>> the optimizer implementor to specify a number of physical property
> > vectors
> > >>> to be tried. For example, for the intersection of two inputs R and S
> > with
> > >>> attributes A, B, and C where R is sorted on (A,B,C) and S is sorted
> on
> > >>> (B,A,C), both these sort orders can be specified by the optimizer
> > >>> implementor and will be optimized by the generated optimizer, while
> > other
> > >>> possible sort orders, e.g., (C,B,A), will be ignored. " [1]
> > >>>
> > >>> Best,
> > >>> Stamatis
> > >>>
> > >>> [1]
> > >>>
> >
> https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
> > >>>
> > >>> On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <
> h.yuan@alibaba-inc.com>
> > >>> wrote:
> > >>>
> > >>>> TL;DR
> > >>>> Both top-down physical TraitSet request and bottom-up TraitSet
> > >>>> derivation have their strongth and weakness, we propose
> > >>>> on-demand TraitSet request to combine the above two, to reduce
> > >>>> the number of plan alternatives that are genereated, especially
> > >>>> in distributed system.
> > >>>>
> > >>>> e.g.
> > >>>> select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
> > >>>>
> > >>>> In non-distributed system, we can generate a sort merge join,
> > >>>> requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.
> > >>>> But if foo happens to be sorted by f3,f2,f1, we may miss the
> > >>>> chance of making use of the delivered ordering of foo. Because
> > >>>> if we require bar to be sorted by b3,b2,b1, we don't need to
> > >>>> sort on foo anymore. There are so many choices, n!, not even
> > >>>> considering asc/desc and null direction. We can't request all
> > >>>> the possible traitsets in top-down way, and can't derive all the
> > >>>> possible traitsets in bottom-up way either.
> > >>>>
> > >>>> We propose on-demand traitset request by adding a new type
> > >>>> of metadata DerivedTraitSets into the built-in metadata system.
> > >>>>
> > >>>> List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)
> > >>>>
> > >>>> In this metadata, every operator returns several possbile traitsets
> > >>>> that may be derived from this operator.
> > >>>>
> > >>>> Using above query as an example, the tablescan on foo should
> > >>>> return traiset with collation on f3, f2, f1.
> > >>>>
> > >>>> In physical implementation rules, e.g. the SortMergeJoinRule,
> > >>>> it gets possible traitsets from both child operators, uses the join
> > >>>> keys to eliminate useless traitsets, leaves out usefull traitsets,
> > >>>> and requests corresponding traitset on the other child.
> > >>>>
> > >>>> This relies on the feature of AbstractConverter, which is turned
> > >>>> off by default, due to performance issue [1].
> > >>>>
> > >>>> Thoughts?
> > >>>>
> > >>>> [1] https://issues.apache.org/jira/browse/CALCITE-2970
> > >>>>
> > >>>> Haisheng
> > >>>>
> > >>>>
> > >>>
> >
> >
>

Re: [DISCUSS] On-demand traitset request

Posted by Stamatis Zampetakis <za...@gmail.com>.

Thanks all for the very interesting usecases and helpful examples.

I would like to stay a bit on the fact that logical operators do not have
physical traits. Calcite's logical operators do have at least one physical
trait which is Convention.NONE. Other logical operators such as:

LogicalTableScan [1]
LogicalFilter [2]
LogicalProject [3]
LogicalWindow [4]

have additional traits regarding collation and distribution. There is
already some sort of trait derivation so to some extend it is possible to
check the traitset of the child (logical) operator before requesting some
other traitset when creating the parent (physical).

I see that this mechanism of adding explicitly traits to logical operators
may be confusing and may also lead to planning problems. Replacing it by
metadata might be a good idea and it is closer to the idea of
"applicability function" mentioned in the Volcano paper. Assuming that we
follow this approach I would assume that the traitset of logical operators
from now on should be always empty.

Is this what you have in mind Haisheng?

Best,
Stamatis

[1]
https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalTableScan.java#L95
[2]
https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalFilter.java#L105
[3]
https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalProject.java#L104
[4]
https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalWindow.java#L95

On Sat, Oct 19, 2019 at 7:39 AM Xiening Dai <xn...@gmail.com> wrote:

> Thanks for the sharing. I like the way you model this problem, Jinfeng.
>
> There’s one minor issue with your example. Let say if R and S doesn’t have
> sorting properties at all. In your case, we would end up adding enforcers
> for LHS and RHS to get collation (a, b, c). Then we would need another
> enforcer to get collation (b, c). This is a sub optimal plan as we could
> have use (b, c, a) for join.
>
> I think in step #2, the join operator would need to take the agg trait
> requirement into account. Then it would have two options -
>
> 1) require *exact/super* match of  (b, c, a) or (c, b, a); this is to
> guarantee the join output would deliver the collation agg needs.
> 2) require permutation match of (a, b, c); in such case, an enforcer might
> be needed for aggregation.
>
> Eventually the cost model decides who is the winner.
>
> There’s a fundamental difference between your model and Haisheng’s
> proposal. In Haisheng’s case, a rel node not only looks at its parent’s
> requirement, but also tries to get the potential traits its input could
> deliver. It would try to align them to eliminate unnecessary alternatives.
>
> In above example, assuming R is (b, c, a) and S is (a, b, c), to implement
> option 1), we would generate two alternatives -
>
> MergeJoin (b, c, a)
>         TableScan R
>         Sort(b, c, a)
>                 TableScan S
>
> MergeJoin(c, b, a)
>         Sort(c, b, a)
>                 TableScan R
>         Sort(c, b, a)
>                 TableScan S
>
> But if we look at the input traits and has the insight that R already
> delivers (b, c, a), we could decide to require (b, c, a) only and avoid
> generating the 2nd plan, which is definitely worse, and reduce the search
> space.
>
>
> > On Oct 18, 2019, at 4:57 PM, Jinfeng Ni <jn...@apache.org> wrote:
> >
> > A little bit of history.  In Drill,  when we first implemented
> > Distribution trait's definition,  we allows both exact match and
> > partial match in satisfy() method. This works fine for single-input
> > operator such aggregation, however it leads to incorrect plan for join
> > query, i.e LHS shuffle with (a, b),  RHS shuffle with (a) .  At that
> > time, we removed partial match, and use exact match only. Yet this
> > changes leads to unnecessary additional exchange.  To mitigate this
> > problem, in join physical operator, for a join key (a, b, c),  we
> > enumerate different distribution requests, yet this lead to more space
> > to explore and significantly increase planning time (which is probably
> > what Haisheng also experienced).  When I look back, I feel probably
> > what we miss is the "coordination" step in the join operator, because
> > if we relax the requirement of satisfy(), for multi-input operators,
> > we have to enforce some "coordination", to make sure multiple input's
> > trait could work together properly.
> >
> >
> >
> > On Fri, Oct 18, 2019 at 4:38 PM Jinfeng Ni <jn...@apache.org> wrote:
> >>
> >> This is an interesting topic. Thanks for bringing up this issue.
> >>
> >> My understanding of Volcano planner is it works in a top-down search
> >> mode (the parent asks for certain trait of its child), while the trait
> >> propagates in a bottom-up way, as Stamatis explained.
> >>
> >> IMHO, the issue comes down to the definition of RelTrait, how to
> >> determine if a trait A could satisfy a request asking for trait B,
> >> that is, how RelTrait.satisfies() method is implemented.
> >>
> >> Let's first clarify different situations, using collation as example.
> >> 1) The collation is requested by query's outmost ORDER BY clause.
> >>   - The generated plan has to have "exact match", i.e same collation
> >> (same column sequence), or "super match" .
> >> exact match:   (a, b)  satisfy  (a, b)
> >> super match:   (a, b, c)  satisfy (a, b)
> >>
> >> 2) The collation is requested by operand with single input, such as
> >> sort-based Aggregation.
> >>   - In such case, a "permutation match" is sufficient.
> >> For instance,  for Aggregation (b,c),  input with collation (c, b)
> >> could satisfy the requirement.
> >> permutation match:  (b, c) satisfy (c, b).         (c, b) satisfy (c, b)
> >> permutation match:  (b, c, a) satisfy (c, b).     (c, b, a) satisfy (c,
> b)
> >>
> >> 3) The collation is requested by operand with >= 2 inputs, such as
> >> sort-based MergeJoin.
> >>  - A permutation match is sufficient for each input
> >>  - MergeJoin has to do coordination, after input's trait propagates
> >> upwards. In other words,  ensure both inputs's permutation match are
> >> actually same sequence. Otherwise,  enforcer could be inserted upon
> >> each input, and the planner generates two plans and let the cost
> >> decide.
> >>
> >> For the first case, this is how today's RelCollation's satisfy()
> >> method is implemented.
> >>
> >> For the second / third cases, use Haisheng's example,
> >>
> >> SELECT DISTINCT c, b FROM
> >>  ( SELECT R.c c, S.b b FROM R, S
> >>        WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> >>
> >> Aggregate . (c, b)
> >>    +--- MergeJoin . (a, b, c)
> >>                |--- TableScan on R
> >>                +--- TableScan on S
> >>
> >> Here is the steps that might take place in the planner:
> >>
> >> 1) Aggregate request permutation match collation (c, b)
> >> 2) MergeJoin request a permutation match of (a, b,c) on both it's input
> >> 3) R respond with collation (c, b, a), which satisfy MergeJoin's LHS
> requirement
> >> 4) S respond with collation (b, c, a), which satisfy MergeJoins' RHS
> requirement
> >> 5) MergeJoin do a coordination o LHS, RHS, and generate two possible
> plans
> >>   MJ1:   Insert a sort of (c, b, a) on RHS.  This MJ operator now has
> >> collation of (c, b, a)
> >>   MJ2:   Insert a sort of (b, c, a) on LHS.  This MJ operator now has
> >> collation of (b, c, a)
> >> 6) MJ1 and MJ2 could both satisfy  permutation match request in step
> >> 1, leading to two possible plans:
> >>  Agg1:  with input of MJ1
> >>  Agg2:  with input of MJ2
> >> 7) planner chooses a best plan based on cost of Agg1 and Agg2.
> >>
> >> I should point that the enforcer sort inserted in step 5 could help
> >> remove redundant sort in its input, if the input's collation is
> >> obtained from sort, by invoking Calcite's SortRemove Rule.
> >>
> >> The above only considers the column sequence. The DESC/ASC, NULL
> >> FIRST/LAST will add more complexity, but we probably use similar idea.
> >>
> >> In summary,  we need :
> >>  1) redefine collation trait's satisfy() policy,  exact match, super
> >> match, permutation match,
> >>  2) different physical operator applies different trait matching
> >> policy, depending on operator's # of inputs, and algorithm
> >> implementation.
> >>
> >>
> >>
> >>
> >>
> >> On Fri, Oct 18, 2019 at 2:51 PM Haisheng Yuan <h....@alibaba-inc.com>
> wrote:
> >>>
> >>> Hi Stamatis,
> >>>
> >>> Thanks for your comment. I think my example didn't make it clear.
> >>>
> >>> When a logical operator is created, it doesn't have any physical,
> >>> propertyand it shouldn't have. When a physical operator is created,
> >>> e.g. in Enumerable convention, it only creates an intuitive traitset
> >>> with it, and requests it children the corresponding ones.
> >>>
> >>> For operators such as Join, Aggregate, Window, which may deliver
> >>> multiple different traitsets, when the parent operator is created and
> >>> request its traitset, it might be good to know what are the poosible
> >>> traitset that the child operator can deliver. e.g.
> >>>
> >>> SELECT DISTINCT c, b FROM
> >>>  ( SELECT R.c c, S.b b FROM R, S
> >>>        WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> >>>
> >>> Suppose R is ordered by (c, b, a), and S is ordered by (b, c, a).
> >>> Here is the logical plan:
> >>> Aggregate
> >>>    +--- InnerJoin
> >>>                |--- TableScan on R
> >>>                +--- TableScan on S
> >>>
> >>> When we create a physical merge join for the inner join, it may just
> >>> have collation sorted on a,b,c. Then the aggreate on top of join will
> >>> request another sort on c,b, thus we miss the best plan. What we
> >>> can do is requesting all the order combinations, which is n!, like
> >>> how the Values operator does. But that is too much.
> >>>
> >>> If we can provide an approach that can minimize the possiple traitset
> >>> that the child operator may deliver, we can reduce the chance of
> missing
> >>> good plans. For the above query, the Aggregate operator can derive
> >>> possible traitsets that its child operator join can deliver, in which
> case,
> >>> the possiple traitsets of join is
> >>> 1. collation on (a,b,c) based on join condition,
> >>> 2. collation on (c,b,a) based on left child,
> >>> 3. collation on (b,c,a) based on right child
> >>> So we can request Aggregate sorted by (c,b) and Join sorted by (c,b,a).
> >>> The number of traiset requests and plan alternatives can be reduced.
> >>> The DerivedTraitSets can be used to derive the possible traitsets from
> >>> Join, and pass through Project, Filter etc...
> >>>
> >>> This is just an example of non-distributed system, for distributed
> system,
> >>> it can save much more by considering the possible distribution
> delivered
> >>> by child operators.
> >>>
> >>> One thing that concerns me is it highly relies on the traiset system
> of the
> >>> underlying physical system. Like Enumerable doesn't consider
> distribution,
> >>> because it is single-node system, but Hive/Flink are distributed
> system.
> >>> - Haisheng
> >>>
> >>> ------------------------------------------------------------------
> >>> 发件人：Stamatis Zampetakis<za...@gmail.com>
> >>> 日 期：2019年10月18日 14:53:41
> >>> 收件人：<de...@calcite.apache.org>
> >>> 主 题：Re: [DISCUSS] On-demand traitset request
> >>>
> >>> Hi Haisheng,
> >>>
> >>> This is an interesting topic but somehow in my mind I thought that this
> >>> mechanism is already in place.
> >>>
> >>> When an operator (logical or physical) is created its traitset is
> >>> determined in bottom-up fashion using the create
> >>> static factory method present in almost all operators. In my mind this
> is
> >>> in some sense the applicability function
> >>> mentioned in [1].
> >>>
> >>> Now during optimization we proceed in top-down manner and we request
> >>> certain traitsets from the operators.
> >>> If it happens and they contain already the requested traits nothing
> needs
> >>> to be done.
> >>>
> >>> In your example when we are about to create the sort-merge join we can
> >>> check what traitsets are present in the inputs
> >>> and if possible request those. Can you elaborate a bit more why do we
> need
> >>> a new type of metadata?
> >>>
> >>> Anyway if we cannot do it at the moment it makes sense to complete the
> >>> missing bits since what you are describing
> >>> was already mentioned in the original design of the Volcano optimizer
> [1].
> >>>
> >>> "If a move to be pursued is the exploration of a normal query
> processing
> >>> algorithm such as merge-join, its cost is calculated by the algorithm's
> >>> cost function. The algorithm's applicability function determines the
> >>> physical properly vectors for the algorithms inputs, and their costs
> and
> >>> optimal plans are found by invoking FindBestPlan for the inputs. For
> some
> >>> binary operators, the actual physical properties of the inputs are not
> as
> >>> important as the consistency of physical properties among the inputs.
> For
> >>> example, for a sort-based implementation of intersection, i.e., an
> >>> algorithm very similar to merge-join, any sort order of the two inputs
> will
> >>> suffice as long as the two inputs are sorted in the same way.
> Similarly,
> >>> for a parallel join, any partitioning of join inputs across multiple
> >>> processing nodes is acceptable if both inputs are partitioned using
> >>> Compatible partitioning rules. For these cases, the search engine
> permits
> >>> the optimizer implementor to specify a number of physical property
> vectors
> >>> to be tried. For example, for the intersection of two inputs R and S
> with
> >>> attributes A, B, and C where R is sorted on (A,B,C) and S is sorted on
> >>> (B,A,C), both these sort orders can be specified by the optimizer
> >>> implementor and will be optimized by the generated optimizer, while
> other
> >>> possible sort orders, e.g., (C,B,A), will be ignored. " [1]
> >>>
> >>> Best,
> >>> Stamatis
> >>>
> >>> [1]
> >>>
> https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
> >>>
> >>> On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <h....@alibaba-inc.com>
> >>> wrote:
> >>>
> >>>> TL;DR
> >>>> Both top-down physical TraitSet request and bottom-up TraitSet
> >>>> derivation have their strongth and weakness, we propose
> >>>> on-demand TraitSet request to combine the above two, to reduce
> >>>> the number of plan alternatives that are genereated, especially
> >>>> in distributed system.
> >>>>
> >>>> e.g.
> >>>> select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
> >>>>
> >>>> In non-distributed system, we can generate a sort merge join,
> >>>> requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.
> >>>> But if foo happens to be sorted by f3,f2,f1, we may miss the
> >>>> chance of making use of the delivered ordering of foo. Because
> >>>> if we require bar to be sorted by b3,b2,b1, we don't need to
> >>>> sort on foo anymore. There are so many choices, n!, not even
> >>>> considering asc/desc and null direction. We can't request all
> >>>> the possible traitsets in top-down way, and can't derive all the
> >>>> possible traitsets in bottom-up way either.
> >>>>
> >>>> We propose on-demand traitset request by adding a new type
> >>>> of metadata DerivedTraitSets into the built-in metadata system.
> >>>>
> >>>> List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)
> >>>>
> >>>> In this metadata, every operator returns several possbile traitsets
> >>>> that may be derived from this operator.
> >>>>
> >>>> Using above query as an example, the tablescan on foo should
> >>>> return traiset with collation on f3, f2, f1.
> >>>>
> >>>> In physical implementation rules, e.g. the SortMergeJoinRule,
> >>>> it gets possible traitsets from both child operators, uses the join
> >>>> keys to eliminate useless traitsets, leaves out usefull traitsets,
> >>>> and requests corresponding traitset on the other child.
> >>>>
> >>>> This relies on the feature of AbstractConverter, which is turned
> >>>> off by default, due to performance issue [1].
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> [1] https://issues.apache.org/jira/browse/CALCITE-2970
> >>>>
> >>>> Haisheng
> >>>>
> >>>>
> >>>
>
>

Re: [DISCUSS] On-demand traitset request

Posted by Xiening Dai <xn...@gmail.com>.

Yes, my major concern is the expanding of search space. If we request the permutation of (a, b, c) then we increase the search space 6 times. In a lot of cases where query is complex, we might not be able to finish the search at all. I think there’s gonna be some heuristics built in to guide the search, which, in some rare cases, would mean reducing the chance of finding the best plan. But essentially this is a trade off to make. An optimizer cannot guarantee to always find the best plan with time/space bound. That’s why we need hinting and other tools.

Regarding your comment - "In the current implementation of VolcanoPlanner, I feel the root issue of long planning time is not to explore all possible satisfying trait.”, I am not sure I understand. If the planner explore more possible traits, there could be better plan, but how would that reduce planning time? Can you please elaborate? Thanks.


> On Oct 31, 2019, at 11:10 PM, Jinfeng Ni <jn...@apache.org> wrote:
> 
> Hi Xiening,
> 
> "Let say if R and S doesn’t have sorting properties at all. In your
> case, we would end up adding enforcers for LHS and RHS to get
> collation (a, b, c). Then we would need another enforcer to get
> collation (b, c). This is a sub optimal plan as we could have use (b,
> c, a) for join."
> 
> In such case, for step 2 when MergeJoin request a permutation match of
> (a, b,c) on both it's input, it is not necessary to end up with
> collation (a, b, c) only. Since it request "permutation", MJ could ask
> all possible satisfying collations, which include (b, c, a). In other
> words, the steps I described did not exclude such plan.
> 
> You may argue it would increase the search space. However,  by
> limiting the search space, without explore all possible choice, we may
> lose the chance getting 'optimal' plan we want.  For instance, in the
> above example, the idea of passing "on demand" trait request (b,c)
> from Agg to MJ is to avoid unnecessary sort (b,c).  In cases where the
> join condition has good filtering, and such sort of join output could
> be quite cheap.  Yet in the plan enumeration, since we use "on demand"
> trait request from parent to guide the actions of MJ, I'm not sure if
> we may restrict the choices we consider in the legs of join, whose
> cardinality could be larger and play a bigger role in the overall
> cost.
> 
> In other words, by using "on demand" trait request, we may restrict
> the choices of plan, possibly in the some operators with larger data
> size.
> 
> In the current implementation of VolcanoPlanner, I feel the root issue
> of long planning time is not to explore all possible satisfying trait.
> It is actually the unnecessary of AbstractConverter, added to the
> equivalence class.
> 
> 
> On Fri, Oct 18, 2019 at 10:39 PM Xiening Dai <xn...@gmail.com> wrote:
>> 
>> Thanks for the sharing. I like the way you model this problem, Jinfeng.
>> 
>> There’s one minor issue with your example. Let say if R and S doesn’t have sorting properties at all. In your case, we would end up adding enforcers for LHS and RHS to get collation (a, b, c). Then we would need another enforcer to get collation (b, c). This is a sub optimal plan as we could have use (b, c, a) for join.
>> 
>> I think in step #2, the join operator would need to take the agg trait requirement into account. Then it would have two options -
>> 
>> 1) require *exact/super* match of  (b, c, a) or (c, b, a); this is to guarantee the join output would deliver the collation agg needs.
>> 2) require permutation match of (a, b, c); in such case, an enforcer might be needed for aggregation.
>> 
>> Eventually the cost model decides who is the winner.
>> 
>> There’s a fundamental difference between your model and Haisheng’s proposal. In Haisheng’s case, a rel node not only looks at its parent’s requirement, but also tries to get the potential traits its input could deliver. It would try to align them to eliminate unnecessary alternatives.
>> 
>> In above example, assuming R is (b, c, a) and S is (a, b, c), to implement option 1), we would generate two alternatives -
>> 
>> MergeJoin (b, c, a)
>>        TableScan R
>>        Sort(b, c, a)
>>                TableScan S
>> 
>> MergeJoin(c, b, a)
>>        Sort(c, b, a)
>>                TableScan R
>>        Sort(c, b, a)
>>                TableScan S
>> 
>> But if we look at the input traits and has the insight that R already delivers (b, c, a), we could decide to require (b, c, a) only and avoid generating the 2nd plan, which is definitely worse, and reduce the search space.
>> 
>> 
>>> On Oct 18, 2019, at 4:57 PM, Jinfeng Ni <jn...@apache.org> wrote:
>>> 
>>> A little bit of history.  In Drill,  when we first implemented
>>> Distribution trait's definition,  we allows both exact match and
>>> partial match in satisfy() method. This works fine for single-input
>>> operator such aggregation, however it leads to incorrect plan for join
>>> query, i.e LHS shuffle with (a, b),  RHS shuffle with (a) .  At that
>>> time, we removed partial match, and use exact match only. Yet this
>>> changes leads to unnecessary additional exchange.  To mitigate this
>>> problem, in join physical operator, for a join key (a, b, c),  we
>>> enumerate different distribution requests, yet this lead to more space
>>> to explore and significantly increase planning time (which is probably
>>> what Haisheng also experienced).  When I look back, I feel probably
>>> what we miss is the "coordination" step in the join operator, because
>>> if we relax the requirement of satisfy(), for multi-input operators,
>>> we have to enforce some "coordination", to make sure multiple input's
>>> trait could work together properly.
>>> 
>>> 
>>> 
>>> On Fri, Oct 18, 2019 at 4:38 PM Jinfeng Ni <jn...@apache.org> wrote:
>>>> 
>>>> This is an interesting topic. Thanks for bringing up this issue.
>>>> 
>>>> My understanding of Volcano planner is it works in a top-down search
>>>> mode (the parent asks for certain trait of its child), while the trait
>>>> propagates in a bottom-up way, as Stamatis explained.
>>>> 
>>>> IMHO, the issue comes down to the definition of RelTrait, how to
>>>> determine if a trait A could satisfy a request asking for trait B,
>>>> that is, how RelTrait.satisfies() method is implemented.
>>>> 
>>>> Let's first clarify different situations, using collation as example.
>>>> 1) The collation is requested by query's outmost ORDER BY clause.
>>>>  - The generated plan has to have "exact match", i.e same collation
>>>> (same column sequence), or "super match" .
>>>> exact match:   (a, b)  satisfy  (a, b)
>>>> super match:   (a, b, c)  satisfy (a, b)
>>>> 
>>>> 2) The collation is requested by operand with single input, such as
>>>> sort-based Aggregation.
>>>>  - In such case, a "permutation match" is sufficient.
>>>> For instance,  for Aggregation (b,c),  input with collation (c, b)
>>>> could satisfy the requirement.
>>>> permutation match:  (b, c) satisfy (c, b).         (c, b) satisfy (c, b)
>>>> permutation match:  (b, c, a) satisfy (c, b).     (c, b, a) satisfy (c, b)
>>>> 
>>>> 3) The collation is requested by operand with >= 2 inputs, such as
>>>> sort-based MergeJoin.
>>>> - A permutation match is sufficient for each input
>>>> - MergeJoin has to do coordination, after input's trait propagates
>>>> upwards. In other words,  ensure both inputs's permutation match are
>>>> actually same sequence. Otherwise,  enforcer could be inserted upon
>>>> each input, and the planner generates two plans and let the cost
>>>> decide.
>>>> 
>>>> For the first case, this is how today's RelCollation's satisfy()
>>>> method is implemented.
>>>> 
>>>> For the second / third cases, use Haisheng's example,
>>>> 
>>>> SELECT DISTINCT c, b FROM
>>>> ( SELECT R.c c, S.b b FROM R, S
>>>>       WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
>>>> 
>>>> Aggregate . (c, b)
>>>>   +--- MergeJoin . (a, b, c)
>>>>               |--- TableScan on R
>>>>               +--- TableScan on S
>>>> 
>>>> Here is the steps that might take place in the planner:
>>>> 
>>>> 1) Aggregate request permutation match collation (c, b)
>>>> 2) MergeJoin request a permutation match of (a, b,c) on both it's input
>>>> 3) R respond with collation (c, b, a), which satisfy MergeJoin's LHS requirement
>>>> 4) S respond with collation (b, c, a), which satisfy MergeJoins' RHS requirement
>>>> 5) MergeJoin do a coordination o LHS, RHS, and generate two possible plans
>>>>  MJ1:   Insert a sort of (c, b, a) on RHS.  This MJ operator now has
>>>> collation of (c, b, a)
>>>>  MJ2:   Insert a sort of (b, c, a) on LHS.  This MJ operator now has
>>>> collation of (b, c, a)
>>>> 6) MJ1 and MJ2 could both satisfy  permutation match request in step
>>>> 1, leading to two possible plans:
>>>> Agg1:  with input of MJ1
>>>> Agg2:  with input of MJ2
>>>> 7) planner chooses a best plan based on cost of Agg1 and Agg2.
>>>> 
>>>> I should point that the enforcer sort inserted in step 5 could help
>>>> remove redundant sort in its input, if the input's collation is
>>>> obtained from sort, by invoking Calcite's SortRemove Rule.
>>>> 
>>>> The above only considers the column sequence. The DESC/ASC, NULL
>>>> FIRST/LAST will add more complexity, but we probably use similar idea.
>>>> 
>>>> In summary,  we need :
>>>> 1) redefine collation trait's satisfy() policy,  exact match, super
>>>> match, permutation match,
>>>> 2) different physical operator applies different trait matching
>>>> policy, depending on operator's # of inputs, and algorithm
>>>> implementation.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Fri, Oct 18, 2019 at 2:51 PM Haisheng Yuan <h....@alibaba-inc.com> wrote:
>>>>> 
>>>>> Hi Stamatis,
>>>>> 
>>>>> Thanks for your comment. I think my example didn't make it clear.
>>>>> 
>>>>> When a logical operator is created, it doesn't have any physical,
>>>>> propertyand it shouldn't have. When a physical operator is created,
>>>>> e.g. in Enumerable convention, it only creates an intuitive traitset
>>>>> with it, and requests it children the corresponding ones.
>>>>> 
>>>>> For operators such as Join, Aggregate, Window, which may deliver
>>>>> multiple different traitsets, when the parent operator is created and
>>>>> request its traitset, it might be good to know what are the poosible
>>>>> traitset that the child operator can deliver. e.g.
>>>>> 
>>>>> SELECT DISTINCT c, b FROM
>>>>> ( SELECT R.c c, S.b b FROM R, S
>>>>>       WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
>>>>> 
>>>>> Suppose R is ordered by (c, b, a), and S is ordered by (b, c, a).
>>>>> Here is the logical plan:
>>>>> Aggregate
>>>>>   +--- InnerJoin
>>>>>               |--- TableScan on R
>>>>>               +--- TableScan on S
>>>>> 
>>>>> When we create a physical merge join for the inner join, it may just
>>>>> have collation sorted on a,b,c. Then the aggreate on top of join will
>>>>> request another sort on c,b, thus we miss the best plan. What we
>>>>> can do is requesting all the order combinations, which is n!, like
>>>>> how the Values operator does. But that is too much.
>>>>> 
>>>>> If we can provide an approach that can minimize the possiple traitset
>>>>> that the child operator may deliver, we can reduce the chance of missing
>>>>> good plans. For the above query, the Aggregate operator can derive
>>>>> possible traitsets that its child operator join can deliver, in which case,
>>>>> the possiple traitsets of join is
>>>>> 1. collation on (a,b,c) based on join condition,
>>>>> 2. collation on (c,b,a) based on left child,
>>>>> 3. collation on (b,c,a) based on right child
>>>>> So we can request Aggregate sorted by (c,b) and Join sorted by (c,b,a).
>>>>> The number of traiset requests and plan alternatives can be reduced.
>>>>> The DerivedTraitSets can be used to derive the possible traitsets from
>>>>> Join, and pass through Project, Filter etc...
>>>>> 
>>>>> This is just an example of non-distributed system, for distributed system,
>>>>> it can save much more by considering the possible distribution delivered
>>>>> by child operators.
>>>>> 
>>>>> One thing that concerns me is it highly relies on the traiset system of the
>>>>> underlying physical system. Like Enumerable doesn't consider distribution,
>>>>> because it is single-node system, but Hive/Flink are distributed system.
>>>>> - Haisheng
>>>>> 
>>>>> ------------------------------------------------------------------
>>>>> 发件人：Stamatis Zampetakis<za...@gmail.com>
>>>>> 日　期：2019年10月18日 14:53:41
>>>>> 收件人：<de...@calcite.apache.org>
>>>>> 主　题：Re: [DISCUSS] On-demand traitset request
>>>>> 
>>>>> Hi Haisheng,
>>>>> 
>>>>> This is an interesting topic but somehow in my mind I thought that this
>>>>> mechanism is already in place.
>>>>> 
>>>>> When an operator (logical or physical) is created its traitset is
>>>>> determined in bottom-up fashion using the create
>>>>> static factory method present in almost all operators. In my mind this is
>>>>> in some sense the applicability function
>>>>> mentioned in [1].
>>>>> 
>>>>> Now during optimization we proceed in top-down manner and we request
>>>>> certain traitsets from the operators.
>>>>> If it happens and they contain already the requested traits nothing needs
>>>>> to be done.
>>>>> 
>>>>> In your example when we are about to create the sort-merge join we can
>>>>> check what traitsets are present in the inputs
>>>>> and if possible request those. Can you elaborate a bit more why do we need
>>>>> a new type of metadata?
>>>>> 
>>>>> Anyway if we cannot do it at the moment it makes sense to complete the
>>>>> missing bits since what you are describing
>>>>> was already mentioned in the original design of the Volcano optimizer [1].
>>>>> 
>>>>> "If a move to be pursued is the exploration of a normal query processing
>>>>> algorithm such as merge-join, its cost is calculated by the algorithm's
>>>>> cost function. The algorithm's applicability function determines the
>>>>> physical properly vectors for the algorithms inputs, and their costs and
>>>>> optimal plans are found by invoking FindBestPlan for the inputs. For some
>>>>> binary operators, the actual physical properties of the inputs are not as
>>>>> important as the consistency of physical properties among the inputs. For
>>>>> example, for a sort-based implementation of intersection, i.e., an
>>>>> algorithm very similar to merge-join, any sort order of the two inputs will
>>>>> suffice as long as the two inputs are sorted in the same way. Similarly,
>>>>> for a parallel join, any partitioning of join inputs across multiple
>>>>> processing nodes is acceptable if both inputs are partitioned using
>>>>> Compatible partitioning rules. For these cases, the search engine permits
>>>>> the optimizer implementor to specify a number of physical property vectors
>>>>> to be tried. For example, for the intersection of two inputs R and S with
>>>>> attributes A, B, and C where R is sorted on (A,B,C) and S is sorted on
>>>>> (B,A,C), both these sort orders can be specified by the optimizer
>>>>> implementor and will be optimized by the generated optimizer, while other
>>>>> possible sort orders, e.g., (C,B,A), will be ignored. " [1]
>>>>> 
>>>>> Best,
>>>>> Stamatis
>>>>> 
>>>>> [1]
>>>>> https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
>>>>> 
>>>>> On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <h....@alibaba-inc.com>
>>>>> wrote:
>>>>> 
>>>>>> TL;DR
>>>>>> Both top-down physical TraitSet request and bottom-up TraitSet
>>>>>> derivation have their strongth and weakness, we propose
>>>>>> on-demand TraitSet request to combine the above two, to reduce
>>>>>> the number of plan alternatives that are genereated, especially
>>>>>> in distributed system.
>>>>>> 
>>>>>> e.g.
>>>>>> select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
>>>>>> 
>>>>>> In non-distributed system, we can generate a sort merge join,
>>>>>> requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.
>>>>>> But if foo happens to be sorted by f3,f2,f1, we may miss the
>>>>>> chance of making use of the delivered ordering of foo. Because
>>>>>> if we require bar to be sorted by b3,b2,b1, we don't need to
>>>>>> sort on foo anymore. There are so many choices, n!, not even
>>>>>> considering asc/desc and null direction. We can't request all
>>>>>> the possible traitsets in top-down way, and can't derive all the
>>>>>> possible traitsets in bottom-up way either.
>>>>>> 
>>>>>> We propose on-demand traitset request by adding a new type
>>>>>> of metadata DerivedTraitSets into the built-in metadata system.
>>>>>> 
>>>>>> List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)
>>>>>> 
>>>>>> In this metadata, every operator returns several possbile traitsets
>>>>>> that may be derived from this operator.
>>>>>> 
>>>>>> Using above query as an example, the tablescan on foo should
>>>>>> return traiset with collation on f3, f2, f1.
>>>>>> 
>>>>>> In physical implementation rules, e.g. the SortMergeJoinRule,
>>>>>> it gets possible traitsets from both child operators, uses the join
>>>>>> keys to eliminate useless traitsets, leaves out usefull traitsets,
>>>>>> and requests corresponding traitset on the other child.
>>>>>> 
>>>>>> This relies on the feature of AbstractConverter, which is turned
>>>>>> off by default, due to performance issue [1].
>>>>>> 
>>>>>> Thoughts?
>>>>>> 
>>>>>> [1] https://issues.apache.org/jira/browse/CALCITE-2970
>>>>>> 
>>>>>> Haisheng
>>>>>> 
>>>>>> 
>>>>> 
>>

Re: [DISCUSS] On-demand traitset request

Posted by Jinfeng Ni <jn...@apache.org>.

Hi Xiening,

"Let say if R and S doesn’t have sorting properties at all. In your
case, we would end up adding enforcers for LHS and RHS to get
collation (a, b, c). Then we would need another enforcer to get
collation (b, c). This is a sub optimal plan as we could have use (b,
c, a) for join."

In such case, for step 2 when MergeJoin request a permutation match of
(a, b,c) on both it's input, it is not necessary to end up with
collation (a, b, c) only. Since it request "permutation", MJ could ask
all possible satisfying collations, which include (b, c, a). In other
words, the steps I described did not exclude such plan.

You may argue it would increase the search space. However,  by
limiting the search space, without explore all possible choice, we may
lose the chance getting 'optimal' plan we want.  For instance, in the
above example, the idea of passing "on demand" trait request (b,c)
from Agg to MJ is to avoid unnecessary sort (b,c).  In cases where the
join condition has good filtering, and such sort of join output could
be quite cheap.  Yet in the plan enumeration, since we use "on demand"
trait request from parent to guide the actions of MJ, I'm not sure if
we may restrict the choices we consider in the legs of join, whose
cardinality could be larger and play a bigger role in the overall
cost.

In other words, by using "on demand" trait request, we may restrict
the choices of plan, possibly in the some operators with larger data
size.

In the current implementation of VolcanoPlanner, I feel the root issue
of long planning time is not to explore all possible satisfying trait.
It is actually the unnecessary of AbstractConverter, added to the
equivalence class.


On Fri, Oct 18, 2019 at 10:39 PM Xiening Dai <xn...@gmail.com> wrote:
>
> Thanks for the sharing. I like the way you model this problem, Jinfeng.
>
> There’s one minor issue with your example. Let say if R and S doesn’t have sorting properties at all. In your case, we would end up adding enforcers for LHS and RHS to get collation (a, b, c). Then we would need another enforcer to get collation (b, c). This is a sub optimal plan as we could have use (b, c, a) for join.
>
> I think in step #2, the join operator would need to take the agg trait requirement into account. Then it would have two options -
>
> 1) require *exact/super* match of  (b, c, a) or (c, b, a); this is to guarantee the join output would deliver the collation agg needs.
> 2) require permutation match of (a, b, c); in such case, an enforcer might be needed for aggregation.
>
> Eventually the cost model decides who is the winner.
>
> There’s a fundamental difference between your model and Haisheng’s proposal. In Haisheng’s case, a rel node not only looks at its parent’s requirement, but also tries to get the potential traits its input could deliver. It would try to align them to eliminate unnecessary alternatives.
>
> In above example, assuming R is (b, c, a) and S is (a, b, c), to implement option 1), we would generate two alternatives -
>
> MergeJoin (b, c, a)
>         TableScan R
>         Sort(b, c, a)
>                 TableScan S
>
> MergeJoin(c, b, a)
>         Sort(c, b, a)
>                 TableScan R
>         Sort(c, b, a)
>                 TableScan S
>
> But if we look at the input traits and has the insight that R already delivers (b, c, a), we could decide to require (b, c, a) only and avoid generating the 2nd plan, which is definitely worse, and reduce the search space.
>
>
> > On Oct 18, 2019, at 4:57 PM, Jinfeng Ni <jn...@apache.org> wrote:
> >
> > A little bit of history.  In Drill,  when we first implemented
> > Distribution trait's definition,  we allows both exact match and
> > partial match in satisfy() method. This works fine for single-input
> > operator such aggregation, however it leads to incorrect plan for join
> > query, i.e LHS shuffle with (a, b),  RHS shuffle with (a) .  At that
> > time, we removed partial match, and use exact match only. Yet this
> > changes leads to unnecessary additional exchange.  To mitigate this
> > problem, in join physical operator, for a join key (a, b, c),  we
> > enumerate different distribution requests, yet this lead to more space
> > to explore and significantly increase planning time (which is probably
> > what Haisheng also experienced).  When I look back, I feel probably
> > what we miss is the "coordination" step in the join operator, because
> > if we relax the requirement of satisfy(), for multi-input operators,
> > we have to enforce some "coordination", to make sure multiple input's
> > trait could work together properly.
> >
> >
> >
> > On Fri, Oct 18, 2019 at 4:38 PM Jinfeng Ni <jn...@apache.org> wrote:
> >>
> >> This is an interesting topic. Thanks for bringing up this issue.
> >>
> >> My understanding of Volcano planner is it works in a top-down search
> >> mode (the parent asks for certain trait of its child), while the trait
> >> propagates in a bottom-up way, as Stamatis explained.
> >>
> >> IMHO, the issue comes down to the definition of RelTrait, how to
> >> determine if a trait A could satisfy a request asking for trait B,
> >> that is, how RelTrait.satisfies() method is implemented.
> >>
> >> Let's first clarify different situations, using collation as example.
> >> 1) The collation is requested by query's outmost ORDER BY clause.
> >>   - The generated plan has to have "exact match", i.e same collation
> >> (same column sequence), or "super match" .
> >> exact match:   (a, b)  satisfy  (a, b)
> >> super match:   (a, b, c)  satisfy (a, b)
> >>
> >> 2) The collation is requested by operand with single input, such as
> >> sort-based Aggregation.
> >>   - In such case, a "permutation match" is sufficient.
> >> For instance,  for Aggregation (b,c),  input with collation (c, b)
> >> could satisfy the requirement.
> >> permutation match:  (b, c) satisfy (c, b).         (c, b) satisfy (c, b)
> >> permutation match:  (b, c, a) satisfy (c, b).     (c, b, a) satisfy (c, b)
> >>
> >> 3) The collation is requested by operand with >= 2 inputs, such as
> >> sort-based MergeJoin.
> >>  - A permutation match is sufficient for each input
> >>  - MergeJoin has to do coordination, after input's trait propagates
> >> upwards. In other words,  ensure both inputs's permutation match are
> >> actually same sequence. Otherwise,  enforcer could be inserted upon
> >> each input, and the planner generates two plans and let the cost
> >> decide.
> >>
> >> For the first case, this is how today's RelCollation's satisfy()
> >> method is implemented.
> >>
> >> For the second / third cases, use Haisheng's example,
> >>
> >> SELECT DISTINCT c, b FROM
> >>  ( SELECT R.c c, S.b b FROM R, S
> >>        WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> >>
> >> Aggregate . (c, b)
> >>    +--- MergeJoin . (a, b, c)
> >>                |--- TableScan on R
> >>                +--- TableScan on S
> >>
> >> Here is the steps that might take place in the planner:
> >>
> >> 1) Aggregate request permutation match collation (c, b)
> >> 2) MergeJoin request a permutation match of (a, b,c) on both it's input
> >> 3) R respond with collation (c, b, a), which satisfy MergeJoin's LHS requirement
> >> 4) S respond with collation (b, c, a), which satisfy MergeJoins' RHS requirement
> >> 5) MergeJoin do a coordination o LHS, RHS, and generate two possible plans
> >>   MJ1:   Insert a sort of (c, b, a) on RHS.  This MJ operator now has
> >> collation of (c, b, a)
> >>   MJ2:   Insert a sort of (b, c, a) on LHS.  This MJ operator now has
> >> collation of (b, c, a)
> >> 6) MJ1 and MJ2 could both satisfy  permutation match request in step
> >> 1, leading to two possible plans:
> >>  Agg1:  with input of MJ1
> >>  Agg2:  with input of MJ2
> >> 7) planner chooses a best plan based on cost of Agg1 and Agg2.
> >>
> >> I should point that the enforcer sort inserted in step 5 could help
> >> remove redundant sort in its input, if the input's collation is
> >> obtained from sort, by invoking Calcite's SortRemove Rule.
> >>
> >> The above only considers the column sequence. The DESC/ASC, NULL
> >> FIRST/LAST will add more complexity, but we probably use similar idea.
> >>
> >> In summary,  we need :
> >>  1) redefine collation trait's satisfy() policy,  exact match, super
> >> match, permutation match,
> >>  2) different physical operator applies different trait matching
> >> policy, depending on operator's # of inputs, and algorithm
> >> implementation.
> >>
> >>
> >>
> >>
> >>
> >> On Fri, Oct 18, 2019 at 2:51 PM Haisheng Yuan <h....@alibaba-inc.com> wrote:
> >>>
> >>> Hi Stamatis,
> >>>
> >>> Thanks for your comment. I think my example didn't make it clear.
> >>>
> >>> When a logical operator is created, it doesn't have any physical,
> >>> propertyand it shouldn't have. When a physical operator is created,
> >>> e.g. in Enumerable convention, it only creates an intuitive traitset
> >>> with it, and requests it children the corresponding ones.
> >>>
> >>> For operators such as Join, Aggregate, Window, which may deliver
> >>> multiple different traitsets, when the parent operator is created and
> >>> request its traitset, it might be good to know what are the poosible
> >>> traitset that the child operator can deliver. e.g.
> >>>
> >>> SELECT DISTINCT c, b FROM
> >>>  ( SELECT R.c c, S.b b FROM R, S
> >>>        WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> >>>
> >>> Suppose R is ordered by (c, b, a), and S is ordered by (b, c, a).
> >>> Here is the logical plan:
> >>> Aggregate
> >>>    +--- InnerJoin
> >>>                |--- TableScan on R
> >>>                +--- TableScan on S
> >>>
> >>> When we create a physical merge join for the inner join, it may just
> >>> have collation sorted on a,b,c. Then the aggreate on top of join will
> >>> request another sort on c,b, thus we miss the best plan. What we
> >>> can do is requesting all the order combinations, which is n!, like
> >>> how the Values operator does. But that is too much.
> >>>
> >>> If we can provide an approach that can minimize the possiple traitset
> >>> that the child operator may deliver, we can reduce the chance of missing
> >>> good plans. For the above query, the Aggregate operator can derive
> >>> possible traitsets that its child operator join can deliver, in which case,
> >>> the possiple traitsets of join is
> >>> 1. collation on (a,b,c) based on join condition,
> >>> 2. collation on (c,b,a) based on left child,
> >>> 3. collation on (b,c,a) based on right child
> >>> So we can request Aggregate sorted by (c,b) and Join sorted by (c,b,a).
> >>> The number of traiset requests and plan alternatives can be reduced.
> >>> The DerivedTraitSets can be used to derive the possible traitsets from
> >>> Join, and pass through Project, Filter etc...
> >>>
> >>> This is just an example of non-distributed system, for distributed system,
> >>> it can save much more by considering the possible distribution delivered
> >>> by child operators.
> >>>
> >>> One thing that concerns me is it highly relies on the traiset system of the
> >>> underlying physical system. Like Enumerable doesn't consider distribution,
> >>> because it is single-node system, but Hive/Flink are distributed system.
> >>> - Haisheng
> >>>
> >>> ------------------------------------------------------------------
> >>> 发件人：Stamatis Zampetakis<za...@gmail.com>
> >>> 日　期：2019年10月18日 14:53:41
> >>> 收件人：<de...@calcite.apache.org>
> >>> 主　题：Re: [DISCUSS] On-demand traitset request
> >>>
> >>> Hi Haisheng,
> >>>
> >>> This is an interesting topic but somehow in my mind I thought that this
> >>> mechanism is already in place.
> >>>
> >>> When an operator (logical or physical) is created its traitset is
> >>> determined in bottom-up fashion using the create
> >>> static factory method present in almost all operators. In my mind this is
> >>> in some sense the applicability function
> >>> mentioned in [1].
> >>>
> >>> Now during optimization we proceed in top-down manner and we request
> >>> certain traitsets from the operators.
> >>> If it happens and they contain already the requested traits nothing needs
> >>> to be done.
> >>>
> >>> In your example when we are about to create the sort-merge join we can
> >>> check what traitsets are present in the inputs
> >>> and if possible request those. Can you elaborate a bit more why do we need
> >>> a new type of metadata?
> >>>
> >>> Anyway if we cannot do it at the moment it makes sense to complete the
> >>> missing bits since what you are describing
> >>> was already mentioned in the original design of the Volcano optimizer [1].
> >>>
> >>> "If a move to be pursued is the exploration of a normal query processing
> >>> algorithm such as merge-join, its cost is calculated by the algorithm's
> >>> cost function. The algorithm's applicability function determines the
> >>> physical properly vectors for the algorithms inputs, and their costs and
> >>> optimal plans are found by invoking FindBestPlan for the inputs. For some
> >>> binary operators, the actual physical properties of the inputs are not as
> >>> important as the consistency of physical properties among the inputs. For
> >>> example, for a sort-based implementation of intersection, i.e., an
> >>> algorithm very similar to merge-join, any sort order of the two inputs will
> >>> suffice as long as the two inputs are sorted in the same way. Similarly,
> >>> for a parallel join, any partitioning of join inputs across multiple
> >>> processing nodes is acceptable if both inputs are partitioned using
> >>> Compatible partitioning rules. For these cases, the search engine permits
> >>> the optimizer implementor to specify a number of physical property vectors
> >>> to be tried. For example, for the intersection of two inputs R and S with
> >>> attributes A, B, and C where R is sorted on (A,B,C) and S is sorted on
> >>> (B,A,C), both these sort orders can be specified by the optimizer
> >>> implementor and will be optimized by the generated optimizer, while other
> >>> possible sort orders, e.g., (C,B,A), will be ignored. " [1]
> >>>
> >>> Best,
> >>> Stamatis
> >>>
> >>> [1]
> >>> https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
> >>>
> >>> On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <h....@alibaba-inc.com>
> >>> wrote:
> >>>
> >>>> TL;DR
> >>>> Both top-down physical TraitSet request and bottom-up TraitSet
> >>>> derivation have their strongth and weakness, we propose
> >>>> on-demand TraitSet request to combine the above two, to reduce
> >>>> the number of plan alternatives that are genereated, especially
> >>>> in distributed system.
> >>>>
> >>>> e.g.
> >>>> select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
> >>>>
> >>>> In non-distributed system, we can generate a sort merge join,
> >>>> requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.
> >>>> But if foo happens to be sorted by f3,f2,f1, we may miss the
> >>>> chance of making use of the delivered ordering of foo. Because
> >>>> if we require bar to be sorted by b3,b2,b1, we don't need to
> >>>> sort on foo anymore. There are so many choices, n!, not even
> >>>> considering asc/desc and null direction. We can't request all
> >>>> the possible traitsets in top-down way, and can't derive all the
> >>>> possible traitsets in bottom-up way either.
> >>>>
> >>>> We propose on-demand traitset request by adding a new type
> >>>> of metadata DerivedTraitSets into the built-in metadata system.
> >>>>
> >>>> List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)
> >>>>
> >>>> In this metadata, every operator returns several possbile traitsets
> >>>> that may be derived from this operator.
> >>>>
> >>>> Using above query as an example, the tablescan on foo should
> >>>> return traiset with collation on f3, f2, f1.
> >>>>
> >>>> In physical implementation rules, e.g. the SortMergeJoinRule,
> >>>> it gets possible traitsets from both child operators, uses the join
> >>>> keys to eliminate useless traitsets, leaves out usefull traitsets,
> >>>> and requests corresponding traitset on the other child.
> >>>>
> >>>> This relies on the feature of AbstractConverter, which is turned
> >>>> off by default, due to performance issue [1].
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> [1] https://issues.apache.org/jira/browse/CALCITE-2970
> >>>>
> >>>> Haisheng
> >>>>
> >>>>
> >>>
>

Re: [DISCUSS] On-demand traitset request

Posted by Xiening Dai <xn...@gmail.com>.

Thanks for the sharing. I like the way you model this problem, Jinfeng.

There’s one minor issue with your example. Let say if R and S doesn’t have sorting properties at all. In your case, we would end up adding enforcers for LHS and RHS to get collation (a, b, c). Then we would need another enforcer to get collation (b, c). This is a sub optimal plan as we could have use (b, c, a) for join.

I think in step #2, the join operator would need to take the agg trait requirement into account. Then it would have two options -

1) require *exact/super* match of  (b, c, a) or (c, b, a); this is to guarantee the join output would deliver the collation agg needs.
2) require permutation match of (a, b, c); in such case, an enforcer might be needed for aggregation.

Eventually the cost model decides who is the winner.

There’s a fundamental difference between your model and Haisheng’s proposal. In Haisheng’s case, a rel node not only looks at its parent’s requirement, but also tries to get the potential traits its input could deliver. It would try to align them to eliminate unnecessary alternatives.

In above example, assuming R is (b, c, a) and S is (a, b, c), to implement option 1), we would generate two alternatives -

MergeJoin (b, c, a)
	TableScan R
	Sort(b, c, a)
		TableScan S

MergeJoin(c, b, a)
	Sort(c, b, a)
		TableScan R
	Sort(c, b, a)
		TableScan S

But if we look at the input traits and has the insight that R already delivers (b, c, a), we could decide to require (b, c, a) only and avoid generating the 2nd plan, which is definitely worse, and reduce the search space. 


> On Oct 18, 2019, at 4:57 PM, Jinfeng Ni <jn...@apache.org> wrote:
> 
> A little bit of history.  In Drill,  when we first implemented
> Distribution trait's definition,  we allows both exact match and
> partial match in satisfy() method. This works fine for single-input
> operator such aggregation, however it leads to incorrect plan for join
> query, i.e LHS shuffle with (a, b),  RHS shuffle with (a) .  At that
> time, we removed partial match, and use exact match only. Yet this
> changes leads to unnecessary additional exchange.  To mitigate this
> problem, in join physical operator, for a join key (a, b, c),  we
> enumerate different distribution requests, yet this lead to more space
> to explore and significantly increase planning time (which is probably
> what Haisheng also experienced).  When I look back, I feel probably
> what we miss is the "coordination" step in the join operator, because
> if we relax the requirement of satisfy(), for multi-input operators,
> we have to enforce some "coordination", to make sure multiple input's
> trait could work together properly.
> 
> 
> 
> On Fri, Oct 18, 2019 at 4:38 PM Jinfeng Ni <jn...@apache.org> wrote:
>> 
>> This is an interesting topic. Thanks for bringing up this issue.
>> 
>> My understanding of Volcano planner is it works in a top-down search
>> mode (the parent asks for certain trait of its child), while the trait
>> propagates in a bottom-up way, as Stamatis explained.
>> 
>> IMHO, the issue comes down to the definition of RelTrait, how to
>> determine if a trait A could satisfy a request asking for trait B,
>> that is, how RelTrait.satisfies() method is implemented.
>> 
>> Let's first clarify different situations, using collation as example.
>> 1) The collation is requested by query's outmost ORDER BY clause.
>>   - The generated plan has to have "exact match", i.e same collation
>> (same column sequence), or "super match" .
>> exact match:   (a, b)  satisfy  (a, b)
>> super match:   (a, b, c)  satisfy (a, b)
>> 
>> 2) The collation is requested by operand with single input, such as
>> sort-based Aggregation.
>>   - In such case, a "permutation match" is sufficient.
>> For instance,  for Aggregation (b,c),  input with collation (c, b)
>> could satisfy the requirement.
>> permutation match:  (b, c) satisfy (c, b).         (c, b) satisfy (c, b)
>> permutation match:  (b, c, a) satisfy (c, b).     (c, b, a) satisfy (c, b)
>> 
>> 3) The collation is requested by operand with >= 2 inputs, such as
>> sort-based MergeJoin.
>>  - A permutation match is sufficient for each input
>>  - MergeJoin has to do coordination, after input's trait propagates
>> upwards. In other words,  ensure both inputs's permutation match are
>> actually same sequence. Otherwise,  enforcer could be inserted upon
>> each input, and the planner generates two plans and let the cost
>> decide.
>> 
>> For the first case, this is how today's RelCollation's satisfy()
>> method is implemented.
>> 
>> For the second / third cases, use Haisheng's example,
>> 
>> SELECT DISTINCT c, b FROM
>>  ( SELECT R.c c, S.b b FROM R, S
>>        WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
>> 
>> Aggregate . (c, b)
>>    +--- MergeJoin . (a, b, c)
>>                |--- TableScan on R
>>                +--- TableScan on S
>> 
>> Here is the steps that might take place in the planner:
>> 
>> 1) Aggregate request permutation match collation (c, b)
>> 2) MergeJoin request a permutation match of (a, b,c) on both it's input
>> 3) R respond with collation (c, b, a), which satisfy MergeJoin's LHS requirement
>> 4) S respond with collation (b, c, a), which satisfy MergeJoins' RHS requirement
>> 5) MergeJoin do a coordination o LHS, RHS, and generate two possible plans
>>   MJ1:   Insert a sort of (c, b, a) on RHS.  This MJ operator now has
>> collation of (c, b, a)
>>   MJ2:   Insert a sort of (b, c, a) on LHS.  This MJ operator now has
>> collation of (b, c, a)
>> 6) MJ1 and MJ2 could both satisfy  permutation match request in step
>> 1, leading to two possible plans:
>>  Agg1:  with input of MJ1
>>  Agg2:  with input of MJ2
>> 7) planner chooses a best plan based on cost of Agg1 and Agg2.
>> 
>> I should point that the enforcer sort inserted in step 5 could help
>> remove redundant sort in its input, if the input's collation is
>> obtained from sort, by invoking Calcite's SortRemove Rule.
>> 
>> The above only considers the column sequence. The DESC/ASC, NULL
>> FIRST/LAST will add more complexity, but we probably use similar idea.
>> 
>> In summary,  we need :
>>  1) redefine collation trait's satisfy() policy,  exact match, super
>> match, permutation match,
>>  2) different physical operator applies different trait matching
>> policy, depending on operator's # of inputs, and algorithm
>> implementation.
>> 
>> 
>> 
>> 
>> 
>> On Fri, Oct 18, 2019 at 2:51 PM Haisheng Yuan <h....@alibaba-inc.com> wrote:
>>> 
>>> Hi Stamatis,
>>> 
>>> Thanks for your comment. I think my example didn't make it clear.
>>> 
>>> When a logical operator is created, it doesn't have any physical,
>>> propertyand it shouldn't have. When a physical operator is created,
>>> e.g. in Enumerable convention, it only creates an intuitive traitset
>>> with it, and requests it children the corresponding ones.
>>> 
>>> For operators such as Join, Aggregate, Window, which may deliver
>>> multiple different traitsets, when the parent operator is created and
>>> request its traitset, it might be good to know what are the poosible
>>> traitset that the child operator can deliver. e.g.
>>> 
>>> SELECT DISTINCT c, b FROM
>>>  ( SELECT R.c c, S.b b FROM R, S
>>>        WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
>>> 
>>> Suppose R is ordered by (c, b, a), and S is ordered by (b, c, a).
>>> Here is the logical plan:
>>> Aggregate
>>>    +--- InnerJoin
>>>                |--- TableScan on R
>>>                +--- TableScan on S
>>> 
>>> When we create a physical merge join for the inner join, it may just
>>> have collation sorted on a,b,c. Then the aggreate on top of join will
>>> request another sort on c,b, thus we miss the best plan. What we
>>> can do is requesting all the order combinations, which is n!, like
>>> how the Values operator does. But that is too much.
>>> 
>>> If we can provide an approach that can minimize the possiple traitset
>>> that the child operator may deliver, we can reduce the chance of missing
>>> good plans. For the above query, the Aggregate operator can derive
>>> possible traitsets that its child operator join can deliver, in which case,
>>> the possiple traitsets of join is
>>> 1. collation on (a,b,c) based on join condition,
>>> 2. collation on (c,b,a) based on left child,
>>> 3. collation on (b,c,a) based on right child
>>> So we can request Aggregate sorted by (c,b) and Join sorted by (c,b,a).
>>> The number of traiset requests and plan alternatives can be reduced.
>>> The DerivedTraitSets can be used to derive the possible traitsets from
>>> Join, and pass through Project, Filter etc...
>>> 
>>> This is just an example of non-distributed system, for distributed system,
>>> it can save much more by considering the possible distribution delivered
>>> by child operators.
>>> 
>>> One thing that concerns me is it highly relies on the traiset system of the
>>> underlying physical system. Like Enumerable doesn't consider distribution,
>>> because it is single-node system, but Hive/Flink are distributed system.
>>> - Haisheng
>>> 
>>> ------------------------------------------------------------------
>>> 发件人：Stamatis Zampetakis<za...@gmail.com>
>>> 日　期：2019年10月18日 14:53:41
>>> 收件人：<de...@calcite.apache.org>
>>> 主　题：Re: [DISCUSS] On-demand traitset request
>>> 
>>> Hi Haisheng,
>>> 
>>> This is an interesting topic but somehow in my mind I thought that this
>>> mechanism is already in place.
>>> 
>>> When an operator (logical or physical) is created its traitset is
>>> determined in bottom-up fashion using the create
>>> static factory method present in almost all operators. In my mind this is
>>> in some sense the applicability function
>>> mentioned in [1].
>>> 
>>> Now during optimization we proceed in top-down manner and we request
>>> certain traitsets from the operators.
>>> If it happens and they contain already the requested traits nothing needs
>>> to be done.
>>> 
>>> In your example when we are about to create the sort-merge join we can
>>> check what traitsets are present in the inputs
>>> and if possible request those. Can you elaborate a bit more why do we need
>>> a new type of metadata?
>>> 
>>> Anyway if we cannot do it at the moment it makes sense to complete the
>>> missing bits since what you are describing
>>> was already mentioned in the original design of the Volcano optimizer [1].
>>> 
>>> "If a move to be pursued is the exploration of a normal query processing
>>> algorithm such as merge-join, its cost is calculated by the algorithm's
>>> cost function. The algorithm's applicability function determines the
>>> physical properly vectors for the algorithms inputs, and their costs and
>>> optimal plans are found by invoking FindBestPlan for the inputs. For some
>>> binary operators, the actual physical properties of the inputs are not as
>>> important as the consistency of physical properties among the inputs. For
>>> example, for a sort-based implementation of intersection, i.e., an
>>> algorithm very similar to merge-join, any sort order of the two inputs will
>>> suffice as long as the two inputs are sorted in the same way. Similarly,
>>> for a parallel join, any partitioning of join inputs across multiple
>>> processing nodes is acceptable if both inputs are partitioned using
>>> Compatible partitioning rules. For these cases, the search engine permits
>>> the optimizer implementor to specify a number of physical property vectors
>>> to be tried. For example, for the intersection of two inputs R and S with
>>> attributes A, B, and C where R is sorted on (A,B,C) and S is sorted on
>>> (B,A,C), both these sort orders can be specified by the optimizer
>>> implementor and will be optimized by the generated optimizer, while other
>>> possible sort orders, e.g., (C,B,A), will be ignored. " [1]
>>> 
>>> Best,
>>> Stamatis
>>> 
>>> [1]
>>> https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
>>> 
>>> On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <h....@alibaba-inc.com>
>>> wrote:
>>> 
>>>> TL;DR
>>>> Both top-down physical TraitSet request and bottom-up TraitSet
>>>> derivation have their strongth and weakness, we propose
>>>> on-demand TraitSet request to combine the above two, to reduce
>>>> the number of plan alternatives that are genereated, especially
>>>> in distributed system.
>>>> 
>>>> e.g.
>>>> select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
>>>> 
>>>> In non-distributed system, we can generate a sort merge join,
>>>> requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.
>>>> But if foo happens to be sorted by f3,f2,f1, we may miss the
>>>> chance of making use of the delivered ordering of foo. Because
>>>> if we require bar to be sorted by b3,b2,b1, we don't need to
>>>> sort on foo anymore. There are so many choices, n!, not even
>>>> considering asc/desc and null direction. We can't request all
>>>> the possible traitsets in top-down way, and can't derive all the
>>>> possible traitsets in bottom-up way either.
>>>> 
>>>> We propose on-demand traitset request by adding a new type
>>>> of metadata DerivedTraitSets into the built-in metadata system.
>>>> 
>>>> List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)
>>>> 
>>>> In this metadata, every operator returns several possbile traitsets
>>>> that may be derived from this operator.
>>>> 
>>>> Using above query as an example, the tablescan on foo should
>>>> return traiset with collation on f3, f2, f1.
>>>> 
>>>> In physical implementation rules, e.g. the SortMergeJoinRule,
>>>> it gets possible traitsets from both child operators, uses the join
>>>> keys to eliminate useless traitsets, leaves out usefull traitsets,
>>>> and requests corresponding traitset on the other child.
>>>> 
>>>> This relies on the feature of AbstractConverter, which is turned
>>>> off by default, due to performance issue [1].
>>>> 
>>>> Thoughts?
>>>> 
>>>> [1] https://issues.apache.org/jira/browse/CALCITE-2970
>>>> 
>>>> Haisheng
>>>> 
>>>> 
>>>

Re: Re: [DISCUSS] On-demand traitset request

Posted by Jinfeng Ni <jn...@apache.org>.

A little bit of history.  In Drill,  when we first implemented
Distribution trait's definition,  we allows both exact match and
partial match in satisfy() method. This works fine for single-input
operator such aggregation, however it leads to incorrect plan for join
query, i.e LHS shuffle with (a, b),  RHS shuffle with (a) .  At that
time, we removed partial match, and use exact match only. Yet this
changes leads to unnecessary additional exchange.  To mitigate this
problem, in join physical operator, for a join key (a, b, c),  we
enumerate different distribution requests, yet this lead to more space
to explore and significantly increase planning time (which is probably
what Haisheng also experienced).  When I look back, I feel probably
what we miss is the "coordination" step in the join operator, because
if we relax the requirement of satisfy(), for multi-input operators,
we have to enforce some "coordination", to make sure multiple input's
trait could work together properly.



On Fri, Oct 18, 2019 at 4:38 PM Jinfeng Ni <jn...@apache.org> wrote:
>
> This is an interesting topic. Thanks for bringing up this issue.
>
> My understanding of Volcano planner is it works in a top-down search
> mode (the parent asks for certain trait of its child), while the trait
> propagates in a bottom-up way, as Stamatis explained.
>
> IMHO, the issue comes down to the definition of RelTrait, how to
> determine if a trait A could satisfy a request asking for trait B,
> that is, how RelTrait.satisfies() method is implemented.
>
> Let's first clarify different situations, using collation as example.
> 1) The collation is requested by query's outmost ORDER BY clause.
>    - The generated plan has to have "exact match", i.e same collation
> (same column sequence), or "super match" .
> exact match:   (a, b)  satisfy  (a, b)
> super match:   (a, b, c)  satisfy (a, b)
>
> 2) The collation is requested by operand with single input, such as
> sort-based Aggregation.
>    - In such case, a "permutation match" is sufficient.
> For instance,  for Aggregation (b,c),  input with collation (c, b)
> could satisfy the requirement.
> permutation match:  (b, c) satisfy (c, b).         (c, b) satisfy (c, b)
> permutation match:  (b, c, a) satisfy (c, b).     (c, b, a) satisfy (c, b)
>
> 3) The collation is requested by operand with >= 2 inputs, such as
> sort-based MergeJoin.
>   - A permutation match is sufficient for each input
>   - MergeJoin has to do coordination, after input's trait propagates
> upwards. In other words,  ensure both inputs's permutation match are
> actually same sequence. Otherwise,  enforcer could be inserted upon
> each input, and the planner generates two plans and let the cost
> decide.
>
> For the first case, this is how today's RelCollation's satisfy()
> method is implemented.
>
> For the second / third cases, use Haisheng's example,
>
> SELECT DISTINCT c, b FROM
>   ( SELECT R.c c, S.b b FROM R, S
>         WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
>
> Aggregate . (c, b)
>     +--- MergeJoin . (a, b, c)
>                 |--- TableScan on R
>                 +--- TableScan on S
>
> Here is the steps that might take place in the planner:
>
> 1) Aggregate request permutation match collation (c, b)
> 2) MergeJoin request a permutation match of (a, b,c) on both it's input
> 3) R respond with collation (c, b, a), which satisfy MergeJoin's LHS requirement
> 4) S respond with collation (b, c, a), which satisfy MergeJoins' RHS requirement
> 5) MergeJoin do a coordination o LHS, RHS, and generate two possible plans
>    MJ1:   Insert a sort of (c, b, a) on RHS.  This MJ operator now has
> collation of (c, b, a)
>    MJ2:   Insert a sort of (b, c, a) on LHS.  This MJ operator now has
> collation of (b, c, a)
> 6) MJ1 and MJ2 could both satisfy  permutation match request in step
> 1, leading to two possible plans:
>   Agg1:  with input of MJ1
>   Agg2:  with input of MJ2
> 7) planner chooses a best plan based on cost of Agg1 and Agg2.
>
> I should point that the enforcer sort inserted in step 5 could help
> remove redundant sort in its input, if the input's collation is
> obtained from sort, by invoking Calcite's SortRemove Rule.
>
> The above only considers the column sequence. The DESC/ASC, NULL
> FIRST/LAST will add more complexity, but we probably use similar idea.
>
> In summary,  we need :
>   1) redefine collation trait's satisfy() policy,  exact match, super
> match, permutation match,
>   2) different physical operator applies different trait matching
> policy, depending on operator's # of inputs, and algorithm
> implementation.
>
>
>
>
>
> On Fri, Oct 18, 2019 at 2:51 PM Haisheng Yuan <h....@alibaba-inc.com> wrote:
> >
> > Hi Stamatis,
> >
> > Thanks for your comment. I think my example didn't make it clear.
> >
> > When a logical operator is created, it doesn't have any physical,
> > propertyand it shouldn't have. When a physical operator is created,
> > e.g. in Enumerable convention, it only creates an intuitive traitset
> > with it, and requests it children the corresponding ones.
> >
> > For operators such as Join, Aggregate, Window, which may deliver
> > multiple different traitsets, when the parent operator is created and
> > request its traitset, it might be good to know what are the poosible
> > traitset that the child operator can deliver. e.g.
> >
> > SELECT DISTINCT c, b FROM
> >   ( SELECT R.c c, S.b b FROM R, S
> >         WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> >
> > Suppose R is ordered by (c, b, a), and S is ordered by (b, c, a).
> > Here is the logical plan:
> > Aggregate
> >     +--- InnerJoin
> >                 |--- TableScan on R
> >                 +--- TableScan on S
> >
> > When we create a physical merge join for the inner join, it may just
> > have collation sorted on a,b,c. Then the aggreate on top of join will
> > request another sort on c,b, thus we miss the best plan. What we
> > can do is requesting all the order combinations, which is n!, like
> > how the Values operator does. But that is too much.
> >
> > If we can provide an approach that can minimize the possiple traitset
> > that the child operator may deliver, we can reduce the chance of missing
> > good plans. For the above query, the Aggregate operator can derive
> > possible traitsets that its child operator join can deliver, in which case,
> > the possiple traitsets of join is
> > 1. collation on (a,b,c) based on join condition,
> > 2. collation on (c,b,a) based on left child,
> > 3. collation on (b,c,a) based on right child
> > So we can request Aggregate sorted by (c,b) and Join sorted by (c,b,a).
> > The number of traiset requests and plan alternatives can be reduced.
> > The DerivedTraitSets can be used to derive the possible traitsets from
> > Join, and pass through Project, Filter etc...
> >
> > This is just an example of non-distributed system, for distributed system,
> > it can save much more by considering the possible distribution delivered
> > by child operators.
> >
> > One thing that concerns me is it highly relies on the traiset system of the
> > underlying physical system. Like Enumerable doesn't consider distribution,
> > because it is single-node system, but Hive/Flink are distributed system.
> > - Haisheng
> >
> > ------------------------------------------------------------------
> > 发件人：Stamatis Zampetakis<za...@gmail.com>
> > 日　期：2019年10月18日 14:53:41
> > 收件人：<de...@calcite.apache.org>
> > 主　题：Re: [DISCUSS] On-demand traitset request
> >
> > Hi Haisheng,
> >
> > This is an interesting topic but somehow in my mind I thought that this
> > mechanism is already in place.
> >
> > When an operator (logical or physical) is created its traitset is
> > determined in bottom-up fashion using the create
> > static factory method present in almost all operators. In my mind this is
> > in some sense the applicability function
> > mentioned in [1].
> >
> > Now during optimization we proceed in top-down manner and we request
> > certain traitsets from the operators.
> > If it happens and they contain already the requested traits nothing needs
> > to be done.
> >
> > In your example when we are about to create the sort-merge join we can
> > check what traitsets are present in the inputs
> > and if possible request those. Can you elaborate a bit more why do we need
> > a new type of metadata?
> >
> > Anyway if we cannot do it at the moment it makes sense to complete the
> > missing bits since what you are describing
> > was already mentioned in the original design of the Volcano optimizer [1].
> >
> > "If a move to be pursued is the exploration of a normal query processing
> > algorithm such as merge-join, its cost is calculated by the algorithm's
> > cost function. The algorithm's applicability function determines the
> > physical properly vectors for the algorithms inputs, and their costs and
> > optimal plans are found by invoking FindBestPlan for the inputs. For some
> > binary operators, the actual physical properties of the inputs are not as
> > important as the consistency of physical properties among the inputs. For
> > example, for a sort-based implementation of intersection, i.e., an
> > algorithm very similar to merge-join, any sort order of the two inputs will
> > suffice as long as the two inputs are sorted in the same way. Similarly,
> > for a parallel join, any partitioning of join inputs across multiple
> > processing nodes is acceptable if both inputs are partitioned using
> > Compatible partitioning rules. For these cases, the search engine permits
> > the optimizer implementor to specify a number of physical property vectors
> > to be tried. For example, for the intersection of two inputs R and S with
> > attributes A, B, and C where R is sorted on (A,B,C) and S is sorted on
> > (B,A,C), both these sort orders can be specified by the optimizer
> > implementor and will be optimized by the generated optimizer, while other
> > possible sort orders, e.g., (C,B,A), will be ignored. " [1]
> >
> > Best,
> > Stamatis
> >
> > [1]
> > https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
> >
> > On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <h....@alibaba-inc.com>
> > wrote:
> >
> > > TL;DR
> > > Both top-down physical TraitSet request and bottom-up TraitSet
> > > derivation have their strongth and weakness, we propose
> > > on-demand TraitSet request to combine the above two, to reduce
> > > the number of plan alternatives that are genereated, especially
> > > in distributed system.
> > >
> > > e.g.
> > > select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
> > >
> > > In non-distributed system, we can generate a sort merge join,
> > > requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.
> > > But if foo happens to be sorted by f3,f2,f1, we may miss the
> > > chance of making use of the delivered ordering of foo. Because
> > > if we require bar to be sorted by b3,b2,b1, we don't need to
> > > sort on foo anymore. There are so many choices, n!, not even
> > > considering asc/desc and null direction. We can't request all
> > > the possible traitsets in top-down way, and can't derive all the
> > > possible traitsets in bottom-up way either.
> > >
> > > We propose on-demand traitset request by adding a new type
> > > of metadata DerivedTraitSets into the built-in metadata system.
> > >
> > > List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)
> > >
> > > In this metadata, every operator returns several possbile traitsets
> > > that may be derived from this operator.
> > >
> > > Using above query as an example, the tablescan on foo should
> > > return traiset with collation on f3, f2, f1.
> > >
> > > In physical implementation rules, e.g. the SortMergeJoinRule,
> > > it gets possible traitsets from both child operators, uses the join
> > > keys to eliminate useless traitsets, leaves out usefull traitsets,
> > > and requests corresponding traitset on the other child.
> > >
> > > This relies on the feature of AbstractConverter, which is turned
> > > off by default, due to performance issue [1].
> > >
> > > Thoughts?
> > >
> > > [1] https://issues.apache.org/jira/browse/CALCITE-2970
> > >
> > > Haisheng
> > >
> > >
> >

Re: Re: [DISCUSS] On-demand traitset request

Posted by Jinfeng Ni <jn...@apache.org>.

This is an interesting topic. Thanks for bringing up this issue.

My understanding of Volcano planner is it works in a top-down search
mode (the parent asks for certain trait of its child), while the trait
propagates in a bottom-up way, as Stamatis explained.

IMHO, the issue comes down to the definition of RelTrait, how to
determine if a trait A could satisfy a request asking for trait B,
that is, how RelTrait.satisfies() method is implemented.

Let's first clarify different situations, using collation as example.
1) The collation is requested by query's outmost ORDER BY clause.
   - The generated plan has to have "exact match", i.e same collation
(same column sequence), or "super match" .
exact match:   (a, b)  satisfy  (a, b)
super match:   (a, b, c)  satisfy (a, b)

2) The collation is requested by operand with single input, such as
sort-based Aggregation.
   - In such case, a "permutation match" is sufficient.
For instance,  for Aggregation (b,c),  input with collation (c, b)
could satisfy the requirement.
permutation match:  (b, c) satisfy (c, b).         (c, b) satisfy (c, b)
permutation match:  (b, c, a) satisfy (c, b).     (c, b, a) satisfy (c, b)

3) The collation is requested by operand with >= 2 inputs, such as
sort-based MergeJoin.
  - A permutation match is sufficient for each input
  - MergeJoin has to do coordination, after input's trait propagates
upwards. In other words,  ensure both inputs's permutation match are
actually same sequence. Otherwise,  enforcer could be inserted upon
each input, and the planner generates two plans and let the cost
decide.

For the first case, this is how today's RelCollation's satisfy()
method is implemented.

For the second / third cases, use Haisheng's example,

SELECT DISTINCT c, b FROM
  ( SELECT R.c c, S.b b FROM R, S
        WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;

Aggregate . (c, b)
    +--- MergeJoin . (a, b, c)
                |--- TableScan on R
                +--- TableScan on S

Here is the steps that might take place in the planner:

1) Aggregate request permutation match collation (c, b)
2) MergeJoin request a permutation match of (a, b,c) on both it's input
3) R respond with collation (c, b, a), which satisfy MergeJoin's LHS requirement
4) S respond with collation (b, c, a), which satisfy MergeJoins' RHS requirement
5) MergeJoin do a coordination o LHS, RHS, and generate two possible plans
   MJ1:   Insert a sort of (c, b, a) on RHS.  This MJ operator now has
collation of (c, b, a)
   MJ2:   Insert a sort of (b, c, a) on LHS.  This MJ operator now has
collation of (b, c, a)
6) MJ1 and MJ2 could both satisfy  permutation match request in step
1, leading to two possible plans:
  Agg1:  with input of MJ1
  Agg2:  with input of MJ2
7) planner chooses a best plan based on cost of Agg1 and Agg2.

I should point that the enforcer sort inserted in step 5 could help
remove redundant sort in its input, if the input's collation is
obtained from sort, by invoking Calcite's SortRemove Rule.

The above only considers the column sequence. The DESC/ASC, NULL
FIRST/LAST will add more complexity, but we probably use similar idea.

In summary,  we need :
  1) redefine collation trait's satisfy() policy,  exact match, super
match, permutation match,
  2) different physical operator applies different trait matching
policy, depending on operator's # of inputs, and algorithm
implementation.





On Fri, Oct 18, 2019 at 2:51 PM Haisheng Yuan <h....@alibaba-inc.com> wrote:
>
> Hi Stamatis,
>
> Thanks for your comment. I think my example didn't make it clear.
>
> When a logical operator is created, it doesn't have any physical,
> propertyand it shouldn't have. When a physical operator is created,
> e.g. in Enumerable convention, it only creates an intuitive traitset
> with it, and requests it children the corresponding ones.
>
> For operators such as Join, Aggregate, Window, which may deliver
> multiple different traitsets, when the parent operator is created and
> request its traitset, it might be good to know what are the poosible
> traitset that the child operator can deliver. e.g.
>
> SELECT DISTINCT c, b FROM
>   ( SELECT R.c c, S.b b FROM R, S
>         WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
>
> Suppose R is ordered by (c, b, a), and S is ordered by (b, c, a).
> Here is the logical plan:
> Aggregate
>     +--- InnerJoin
>                 |--- TableScan on R
>                 +--- TableScan on S
>
> When we create a physical merge join for the inner join, it may just
> have collation sorted on a,b,c. Then the aggreate on top of join will
> request another sort on c,b, thus we miss the best plan. What we
> can do is requesting all the order combinations, which is n!, like
> how the Values operator does. But that is too much.
>
> If we can provide an approach that can minimize the possiple traitset
> that the child operator may deliver, we can reduce the chance of missing
> good plans. For the above query, the Aggregate operator can derive
> possible traitsets that its child operator join can deliver, in which case,
> the possiple traitsets of join is
> 1. collation on (a,b,c) based on join condition,
> 2. collation on (c,b,a) based on left child,
> 3. collation on (b,c,a) based on right child
> So we can request Aggregate sorted by (c,b) and Join sorted by (c,b,a).
> The number of traiset requests and plan alternatives can be reduced.
> The DerivedTraitSets can be used to derive the possible traitsets from
> Join, and pass through Project, Filter etc...
>
> This is just an example of non-distributed system, for distributed system,
> it can save much more by considering the possible distribution delivered
> by child operators.
>
> One thing that concerns me is it highly relies on the traiset system of the
> underlying physical system. Like Enumerable doesn't consider distribution,
> because it is single-node system, but Hive/Flink are distributed system.
> - Haisheng
>
> ------------------------------------------------------------------
> 发件人：Stamatis Zampetakis<za...@gmail.com>
> 日　期：2019年10月18日 14:53:41
> 收件人：<de...@calcite.apache.org>
> 主　题：Re: [DISCUSS] On-demand traitset request
>
> Hi Haisheng,
>
> This is an interesting topic but somehow in my mind I thought that this
> mechanism is already in place.
>
> When an operator (logical or physical) is created its traitset is
> determined in bottom-up fashion using the create
> static factory method present in almost all operators. In my mind this is
> in some sense the applicability function
> mentioned in [1].
>
> Now during optimization we proceed in top-down manner and we request
> certain traitsets from the operators.
> If it happens and they contain already the requested traits nothing needs
> to be done.
>
> In your example when we are about to create the sort-merge join we can
> check what traitsets are present in the inputs
> and if possible request those. Can you elaborate a bit more why do we need
> a new type of metadata?
>
> Anyway if we cannot do it at the moment it makes sense to complete the
> missing bits since what you are describing
> was already mentioned in the original design of the Volcano optimizer [1].
>
> "If a move to be pursued is the exploration of a normal query processing
> algorithm such as merge-join, its cost is calculated by the algorithm's
> cost function. The algorithm's applicability function determines the
> physical properly vectors for the algorithms inputs, and their costs and
> optimal plans are found by invoking FindBestPlan for the inputs. For some
> binary operators, the actual physical properties of the inputs are not as
> important as the consistency of physical properties among the inputs. For
> example, for a sort-based implementation of intersection, i.e., an
> algorithm very similar to merge-join, any sort order of the two inputs will
> suffice as long as the two inputs are sorted in the same way. Similarly,
> for a parallel join, any partitioning of join inputs across multiple
> processing nodes is acceptable if both inputs are partitioned using
> Compatible partitioning rules. For these cases, the search engine permits
> the optimizer implementor to specify a number of physical property vectors
> to be tried. For example, for the intersection of two inputs R and S with
> attributes A, B, and C where R is sorted on (A,B,C) and S is sorted on
> (B,A,C), both these sort orders can be specified by the optimizer
> implementor and will be optimized by the generated optimizer, while other
> possible sort orders, e.g., (C,B,A), will be ignored. " [1]
>
> Best,
> Stamatis
>
> [1]
> https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
>
> On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <h....@alibaba-inc.com>
> wrote:
>
> > TL;DR
> > Both top-down physical TraitSet request and bottom-up TraitSet
> > derivation have their strongth and weakness, we propose
> > on-demand TraitSet request to combine the above two, to reduce
> > the number of plan alternatives that are genereated, especially
> > in distributed system.
> >
> > e.g.
> > select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
> >
> > In non-distributed system, we can generate a sort merge join,
> > requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.
> > But if foo happens to be sorted by f3,f2,f1, we may miss the
> > chance of making use of the delivered ordering of foo. Because
> > if we require bar to be sorted by b3,b2,b1, we don't need to
> > sort on foo anymore. There are so many choices, n!, not even
> > considering asc/desc and null direction. We can't request all
> > the possible traitsets in top-down way, and can't derive all the
> > possible traitsets in bottom-up way either.
> >
> > We propose on-demand traitset request by adding a new type
> > of metadata DerivedTraitSets into the built-in metadata system.
> >
> > List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)
> >
> > In this metadata, every operator returns several possbile traitsets
> > that may be derived from this operator.
> >
> > Using above query as an example, the tablescan on foo should
> > return traiset with collation on f3, f2, f1.
> >
> > In physical implementation rules, e.g. the SortMergeJoinRule,
> > it gets possible traitsets from both child operators, uses the join
> > keys to eliminate useless traitsets, leaves out usefull traitsets,
> > and requests corresponding traitset on the other child.
> >
> > This relies on the feature of AbstractConverter, which is turned
> > off by default, due to performance issue [1].
> >
> > Thoughts?
> >
> > [1] https://issues.apache.org/jira/browse/CALCITE-2970
> >
> > Haisheng
> >
> >
>

Re: Re: [DISCUSS] On-demand traitset request

Posted by Haisheng Yuan <h....@alibaba-inc.com>.

Hi Stamatis,

Thanks for your comment. I think my example didn't make it clear.

When a logical operator is created, it doesn't have any physical,
propertyand it shouldn't have. When a physical operator is created,
e.g. in Enumerable convention, it only creates an intuitive traitset
with it, and requests it children the corresponding ones.

For operators such as Join, Aggregate, Window, which may deliver 
multiple different traitsets, when the parent operator is created and
request its traitset, it might be good to know what are the poosible
traitset that the child operator can deliver. e.g.

SELECT DISTINCT c, b FROM
  ( SELECT R.c c, S.b b FROM R, S 
        WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;

Suppose R is ordered by (c, b, a), and S is ordered by (b, c, a).
Here is the logical plan:
Aggregate
    +--- InnerJoin
                |--- TableScan on R
                +--- TableScan on S

When we create a physical merge join for the inner join, it may just
have collation sorted on a,b,c. Then the aggreate on top of join will
request another sort on c,b, thus we miss the best plan. What we
can do is requesting all the order combinations, which is n!, like
how the Values operator does. But that is too much.

If we can provide an approach that can minimize the possiple traitset
that the child operator may deliver, we can reduce the chance of missing
good plans. For the above query, the Aggregate operator can derive
possible traitsets that its child operator join can deliver, in which case,
the possiple traitsets of join is 
1. collation on (a,b,c) based on join condition, 
2. collation on (c,b,a) based on left child,
3. collation on (b,c,a) based on right child 
So we can request Aggregate sorted by (c,b) and Join sorted by (c,b,a).
The number of traiset requests and plan alternatives can be reduced.
The DerivedTraitSets can be used to derive the possible traitsets from
Join, and pass through Project, Filter etc...

This is just an example of non-distributed system, for distributed system,
it can save much more by considering the possible distribution delivered
by child operators.

One thing that concerns me is it highly relies on the traiset system of the
underlying physical system. Like Enumerable doesn't consider distribution,
because it is single-node system, but Hive/Flink are distributed system.
- Haisheng

------------------------------------------------------------------
发件人：Stamatis Zampetakis<za...@gmail.com>
日　期：2019年10月18日 14:53:41
收件人：<de...@calcite.apache.org>
主　题：Re: [DISCUSS] On-demand traitset request

Hi Haisheng,

This is an interesting topic but somehow in my mind I thought that this
mechanism is already in place.

When an operator (logical or physical) is created its traitset is
determined in bottom-up fashion using the create
static factory method present in almost all operators. In my mind this is
in some sense the applicability function
mentioned in [1].

Now during optimization we proceed in top-down manner and we request
certain traitsets from the operators.
If it happens and they contain already the requested traits nothing needs
to be done.

In your example when we are about to create the sort-merge join we can
check what traitsets are present in the inputs
and if possible request those. Can you elaborate a bit more why do we need
a new type of metadata?

Anyway if we cannot do it at the moment it makes sense to complete the
missing bits since what you are describing
was already mentioned in the original design of the Volcano optimizer [1].

"If a move to be pursued is the exploration of a normal query processing
algorithm such as merge-join, its cost is calculated by the algorithm's
cost function. The algorithm's applicability function determines the
physical properly vectors for the algorithms inputs, and their costs and
optimal plans are found by invoking FindBestPlan for the inputs. For some
binary operators, the actual physical properties of the inputs are not as
important as the consistency of physical properties among the inputs. For
example, for a sort-based implementation of intersection, i.e., an
algorithm very similar to merge-join, any sort order of the two inputs will
suffice as long as the two inputs are sorted in the same way. Similarly,
for a parallel join, any partitioning of join inputs across multiple
processing nodes is acceptable if both inputs are partitioned using
Compatible partitioning rules. For these cases, the search engine permits
the optimizer implementor to specify a number of physical property vectors
to be tried. For example, for the intersection of two inputs R and S with
attributes A, B, and C where R is sorted on (A,B,C) and S is sorted on
(B,A,C), both these sort orders can be specified by the optimizer
implementor and will be optimized by the generated optimizer, while other
possible sort orders, e.g., (C,B,A), will be ignored. " [1]

Best,
Stamatis

[1]
https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf

On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <h....@alibaba-inc.com>
wrote:

> TL;DR
> Both top-down physical TraitSet request and bottom-up TraitSet
> derivation have their strongth and weakness, we propose
> on-demand TraitSet request to combine the above two, to reduce
> the number of plan alternatives that are genereated, especially
> in distributed system.
>
> e.g.
> select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
>
> In non-distributed system, we can generate a sort merge join,
> requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.
> But if foo happens to be sorted by f3,f2,f1, we may miss the
> chance of making use of the delivered ordering of foo. Because
> if we require bar to be sorted by b3,b2,b1, we don't need to
> sort on foo anymore. There are so many choices, n!, not even
> considering asc/desc and null direction. We can't request all
> the possible traitsets in top-down way, and can't derive all the
> possible traitsets in bottom-up way either.
>
> We propose on-demand traitset request by adding a new type
> of metadata DerivedTraitSets into the built-in metadata system.
>
> List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)
>
> In this metadata, every operator returns several possbile traitsets
> that may be derived from this operator.
>
> Using above query as an example, the tablescan on foo should
> return traiset with collation on f3, f2, f1.
>
> In physical implementation rules, e.g. the SortMergeJoinRule,
> it gets possible traitsets from both child operators, uses the join
> keys to eliminate useless traitsets, leaves out usefull traitsets,
> and requests corresponding traitset on the other child.
>
> This relies on the feature of AbstractConverter, which is turned
> off by default, due to performance issue [1].
>
> Thoughts?
>
> [1] https://issues.apache.org/jira/browse/CALCITE-2970
>
> Haisheng
>
>

Re: [DISCUSS] On-demand traitset request

Posted by Julian Hyde <jh...@apache.org>.

To clarify. The purpose of this API would be to give the search engine
more high-level as to the goals it should focus on. The performance
issues described in https://issues.apache.org/jira/browse/CALCITE-2970
seem to be due to the planner "trying everything", and the solution
might be to add a bit more high-level structure.

On Fri, Oct 18, 2019 at 11:07 AM Julian Hyde <jh...@apache.org> wrote:
>
> Excellent, very important discussion. This has been a major missing
> feature for a long time. Let's be sure to get to a conclusion and
> implement something.
>
> From the Volcano paper:
>
>   "the search engine permits the optimizer implementor to specify
>   a number of physical property vectors to be tried"
>
> How would we achieve this? Would we add an API to RelOptRule? If so,
> what would that API look like?
>
> On Thu, Oct 17, 2019 at 11:54 PM Stamatis Zampetakis <za...@gmail.com> wrote:
> >
> > Hi Haisheng,
> >
> > This is an interesting topic but somehow in my mind I thought that this
> > mechanism is already in place.
> >
> > When an operator (logical or physical) is created its traitset is
> > determined in bottom-up fashion using the create
> > static factory method present in almost all operators. In my mind this is
> > in some sense the applicability function
> > mentioned in [1].
> >
> > Now during optimization we proceed in top-down manner and we request
> > certain traitsets from the operators.
> > If it happens and they contain already the requested traits nothing needs
> > to be done.
> >
> > In your example when we are about to create the sort-merge join we can
> > check what traitsets are present in the inputs
> > and if possible request those. Can you elaborate a bit more why do we need
> > a new type of metadata?
> >
> > Anyway if we cannot do it at the moment it makes sense to complete the
> > missing bits since what you are describing
> > was already mentioned in the original design of the Volcano optimizer [1].
> >
> > "If a move to be pursued is the exploration of a normal query processing
> > algorithm such as merge-join, its cost is calculated by the algorithm's
> > cost function. The algorithm's applicability function determines the
> > physical properly vectors for the algorithms inputs, and their costs and
> > optimal plans are found by invoking FindBestPlan for the inputs. For some
> > binary operators, the actual physical properties of the inputs are not as
> > important as the consistency of physical properties among the inputs. For
> > example, for a sort-based implementation of intersection, i.e., an
> > algorithm very similar to merge-join, any sort order of the two inputs will
> > suffice as long as the two inputs are sorted in the same way. Similarly,
> > for a parallel join, any partitioning of join inputs across multiple
> > processing nodes is acceptable if both inputs are partitioned using
> > Compatible partitioning rules. For these cases, the search engine permits
> > the optimizer implementor to specify a number of physical property vectors
> > to be tried. For example, for the intersection of two inputs R and S with
> > attributes A, B, and C where R is sorted on (A,B,C) and S is sorted on
> > (B,A,C), both these sort orders can be specified by the optimizer
> > implementor and will be optimized by the generated optimizer, while other
> > possible sort orders, e.g., (C,B,A), will be ignored. " [1]
> >
> > Best,
> > Stamatis
> >
> > [1]
> > https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
> >
> > On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <h....@alibaba-inc.com>
> > wrote:
> >
> > > TL;DR
> > > Both top-down physical TraitSet request and bottom-up TraitSet
> > > derivation have their strongth and weakness, we propose
> > > on-demand TraitSet request to combine the above two, to reduce
> > > the number of plan alternatives that are genereated, especially
> > > in distributed system.
> > >
> > > e.g.
> > > select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
> > >
> > > In non-distributed system, we can generate a sort merge join,
> > > requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.
> > > But if foo happens to be sorted by f3,f2,f1, we may miss the
> > > chance of making use of the delivered ordering of foo. Because
> > > if we require bar to be sorted by b3,b2,b1, we don't need to
> > > sort on foo anymore. There are so many choices, n!, not even
> > > considering asc/desc and null direction. We can't request all
> > > the possible traitsets in top-down way, and can't derive all the
> > > possible traitsets in bottom-up way either.
> > >
> > > We propose on-demand traitset request by adding a new type
> > > of metadata DerivedTraitSets into the built-in metadata system.
> > >
> > > List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)
> > >
> > > In this metadata, every operator returns several possbile traitsets
> > > that may be derived from this operator.
> > >
> > > Using above query as an example, the tablescan on foo should
> > > return traiset with collation on f3, f2, f1.
> > >
> > > In physical implementation rules, e.g. the SortMergeJoinRule,
> > > it gets possible traitsets from both child operators, uses the join
> > > keys to eliminate useless traitsets, leaves out usefull traitsets,
> > > and requests corresponding traitset on the other child.
> > >
> > > This relies on the feature of AbstractConverter, which is turned
> > > off by default, due to performance issue [1].
> > >
> > > Thoughts?
> > >
> > > [1] https://issues.apache.org/jira/browse/CALCITE-2970
> > >
> > > Haisheng
> > >
> > >

Re: [DISCUSS] On-demand traitset request

Posted by Julian Hyde <jh...@apache.org>.

Excellent, very important discussion. This has been a major missing
feature for a long time. Let's be sure to get to a conclusion and
implement something.

From the Volcano paper:

  "the search engine permits the optimizer implementor to specify
  a number of physical property vectors to be tried"

How would we achieve this? Would we add an API to RelOptRule? If so,
what would that API look like?

On Thu, Oct 17, 2019 at 11:54 PM Stamatis Zampetakis <za...@gmail.com> wrote:
>
> Hi Haisheng,
>
> This is an interesting topic but somehow in my mind I thought that this
> mechanism is already in place.
>
> When an operator (logical or physical) is created its traitset is
> determined in bottom-up fashion using the create
> static factory method present in almost all operators. In my mind this is
> in some sense the applicability function
> mentioned in [1].
>
> Now during optimization we proceed in top-down manner and we request
> certain traitsets from the operators.
> If it happens and they contain already the requested traits nothing needs
> to be done.
>
> In your example when we are about to create the sort-merge join we can
> check what traitsets are present in the inputs
> and if possible request those. Can you elaborate a bit more why do we need
> a new type of metadata?
>
> Anyway if we cannot do it at the moment it makes sense to complete the
> missing bits since what you are describing
> was already mentioned in the original design of the Volcano optimizer [1].
>
> "If a move to be pursued is the exploration of a normal query processing
> algorithm such as merge-join, its cost is calculated by the algorithm's
> cost function. The algorithm's applicability function determines the
> physical properly vectors for the algorithms inputs, and their costs and
> optimal plans are found by invoking FindBestPlan for the inputs. For some
> binary operators, the actual physical properties of the inputs are not as
> important as the consistency of physical properties among the inputs. For
> example, for a sort-based implementation of intersection, i.e., an
> algorithm very similar to merge-join, any sort order of the two inputs will
> suffice as long as the two inputs are sorted in the same way. Similarly,
> for a parallel join, any partitioning of join inputs across multiple
> processing nodes is acceptable if both inputs are partitioned using
> Compatible partitioning rules. For these cases, the search engine permits
> the optimizer implementor to specify a number of physical property vectors
> to be tried. For example, for the intersection of two inputs R and S with
> attributes A, B, and C where R is sorted on (A,B,C) and S is sorted on
> (B,A,C), both these sort orders can be specified by the optimizer
> implementor and will be optimized by the generated optimizer, while other
> possible sort orders, e.g., (C,B,A), will be ignored. " [1]
>
> Best,
> Stamatis
>
> [1]
> https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
>
> On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <h....@alibaba-inc.com>
> wrote:
>
> > TL;DR
> > Both top-down physical TraitSet request and bottom-up TraitSet
> > derivation have their strongth and weakness, we propose
> > on-demand TraitSet request to combine the above two, to reduce
> > the number of plan alternatives that are genereated, especially
> > in distributed system.
> >
> > e.g.
> > select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
> >
> > In non-distributed system, we can generate a sort merge join,
> > requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.
> > But if foo happens to be sorted by f3,f2,f1, we may miss the
> > chance of making use of the delivered ordering of foo. Because
> > if we require bar to be sorted by b3,b2,b1, we don't need to
> > sort on foo anymore. There are so many choices, n!, not even
> > considering asc/desc and null direction. We can't request all
> > the possible traitsets in top-down way, and can't derive all the
> > possible traitsets in bottom-up way either.
> >
> > We propose on-demand traitset request by adding a new type
> > of metadata DerivedTraitSets into the built-in metadata system.
> >
> > List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)
> >
> > In this metadata, every operator returns several possbile traitsets
> > that may be derived from this operator.
> >
> > Using above query as an example, the tablescan on foo should
> > return traiset with collation on f3, f2, f1.
> >
> > In physical implementation rules, e.g. the SortMergeJoinRule,
> > it gets possible traitsets from both child operators, uses the join
> > keys to eliminate useless traitsets, leaves out usefull traitsets,
> > and requests corresponding traitset on the other child.
> >
> > This relies on the feature of AbstractConverter, which is turned
> > off by default, due to performance issue [1].
> >
> > Thoughts?
> >
> > [1] https://issues.apache.org/jira/browse/CALCITE-2970
> >
> > Haisheng
> >
> >

Re: [DISCUSS] On-demand traitset request

Posted by Stamatis Zampetakis <za...@gmail.com>.

Hi Haisheng,

This is an interesting topic but somehow in my mind I thought that this
mechanism is already in place.

When an operator (logical or physical) is created its traitset is
determined in bottom-up fashion using the create
static factory method present in almost all operators. In my mind this is
in some sense the applicability function
mentioned in [1].

Now during optimization we proceed in top-down manner and we request
certain traitsets from the operators.
If it happens and they contain already the requested traits nothing needs
to be done.

In your example when we are about to create the sort-merge join we can
check what traitsets are present in the inputs
and if possible request those. Can you elaborate a bit more why do we need
a new type of metadata?

Anyway if we cannot do it at the moment it makes sense to complete the
missing bits since what you are describing
was already mentioned in the original design of the Volcano optimizer [1].

"If a move to be pursued is the exploration of a normal query processing
algorithm such as merge-join, its cost is calculated by the algorithm's
cost function. The algorithm's applicability function determines the
physical properly vectors for the algorithms inputs, and their costs and
optimal plans are found by invoking FindBestPlan for the inputs. For some
binary operators, the actual physical properties of the inputs are not as
important as the consistency of physical properties among the inputs. For
example, for a sort-based implementation of intersection, i.e., an
algorithm very similar to merge-join, any sort order of the two inputs will
suffice as long as the two inputs are sorted in the same way. Similarly,
for a parallel join, any partitioning of join inputs across multiple
processing nodes is acceptable if both inputs are partitioned using
Compatible partitioning rules. For these cases, the search engine permits
the optimizer implementor to specify a number of physical property vectors
to be tried. For example, for the intersection of two inputs R and S with
attributes A, B, and C where R is sorted on (A,B,C) and S is sorted on
(B,A,C), both these sort orders can be specified by the optimizer
implementor and will be optimized by the generated optimizer, while other
possible sort orders, e.g., (C,B,A), will be ignored. " [1]

Best,
Stamatis

[1]
https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf

On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <h....@alibaba-inc.com>
wrote:

> TL;DR
> Both top-down physical TraitSet request and bottom-up TraitSet
> derivation have their strongth and weakness, we propose
> on-demand TraitSet request to combine the above two, to reduce
> the number of plan alternatives that are genereated, especially
> in distributed system.
>
> e.g.
> select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
>
> In non-distributed system, we can generate a sort merge join,
> requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.
> But if foo happens to be sorted by f3,f2,f1, we may miss the
> chance of making use of the delivered ordering of foo. Because
> if we require bar to be sorted by b3,b2,b1, we don't need to
> sort on foo anymore. There are so many choices, n!, not even
> considering asc/desc and null direction. We can't request all
> the possible traitsets in top-down way, and can't derive all the
> possible traitsets in bottom-up way either.
>
> We propose on-demand traitset request by adding a new type
> of metadata DerivedTraitSets into the built-in metadata system.
>
> List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)
>
> In this metadata, every operator returns several possbile traitsets
> that may be derived from this operator.
>
> Using above query as an example, the tablescan on foo should
> return traiset with collation on f3, f2, f1.
>
> In physical implementation rules, e.g. the SortMergeJoinRule,
> it gets possible traitsets from both child operators, uses the join
> keys to eliminate useless traitsets, leaves out usefull traitsets,
> and requests corresponding traitset on the other child.
>
> This relies on the feature of AbstractConverter, which is turned
> off by default, due to performance issue [1].
>
> Thoughts?
>
> [1] https://issues.apache.org/jira/browse/CALCITE-2970
>
> Haisheng
>
>