You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by 吴晓菊 <ch...@gmail.com> on 2018/10/10 12:23:57 UTC

Re: [DISCUSS] Cascades style CBO for Spark SQL

Hi All,

Takeshi Yamamuro gave some comments on this topic on twitter. And after
more research, here are correction and updates of my understanding about
bottom-up and top-down now.

Bottom-up and top-down are just 2 strategies to enumerate join order and
generate the search space. Both of them can get the *same best plan* if the
statistics and cost model are the same. (While the best plan is "best"
theoretically since the cost model is an estimation of the performance and
workload of environment, and join selectivity is also based on estimation).

While the performance of the optimizer depends not on bottom-up or top-down
but on the algorithms used in bottom-up or top-down framework.

Both bottom-up and top-down have many algorithms and some are similar and
compete with each other.

Bottom-up:
     DPsize, DPsub, DPccp
     (Papers: *"Analysis of Two Existing and One New Dynamic Programming
Algorithm for the Generation of Optimal Bushy Join Trees without Cross
Products"* )
Top-Down:
     Basic algorithms based on Commutativity and Associativity: RS-B1, RS-B2
     Graph
algorithms:  TDMinCutLazy, TDMinCutBranch, TDMinCutConservative......
     (Papers:*"Optimal Top-Down Join Enumeration", *
                   *"Optimizing Join Enumeration in Transformation-based
Query Optimizers"*,
                   *"Effective and Robust Pruning for Top-Down Join
Enumeration Algorithms"*)

Before the graph algorithms for top-down, it's known that bottom-up is more
efficient especially for CartesianProduct-free search space. While top-down
has the capability to do pruning. And after graph algorithms for top-down,
it can also be CP-free. More details is provided in this paper   *"Optimizing
Join Enumeration in Transformation-based Query Optimizers".*

In conclusion, my suggestion is to implement a Cascades like top-down
optimizer which is based on best graph algorithm, CP-free and pruning
enabled. Also a good cost model is provided which is based on physical
implementations, for example, hashjoin and sortmerge-join have the same
input and output, but the time spent on reading, computing and copying are
different. Details can be similar with what is done in Hive(
https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+in+Hive
)

@Xiao Li <ga...@gmail.com> @Yamamuro  any comments?

Thanks,
Xiaoju



吴晓菊 <ch...@gmail.com> 于2018年9月26日周三 上午10:39写道:

> Hi Xiao
>
> Quite agree with you that a good cost model is important, instead of
> current stats based cost.
>
> While I think the bottom-up framework itself has limitation since it only
> keeps one best plan of each level. But it doesn't exactly mean the best
> plan of the final level. If you want to get the exact best plan of all in
> current bottom-up framework, you need to enumerate all alternative plans
> and compare the costs of them.
>
> Volcano/Cascades framework provides a more efficient solution which is
> already used in Calcite, Greenplum, SQL Server....
>
> So I think both framework and cost model are important.
>
> We are now working on a Cascades POC, also considering about a new cost
> model. We want to know if the community is interested in this feature. If
> yes, we can share more detailed design and discuss with you.
>
> Thanks,
> Xiaoju
>
>
>
> Xiao Li <ga...@gmail.com> 于2018年9月26日周三 上午8:30写道:
>
>> Hi, Xiaoju,
>>
>> Thanks for sending this to the dev list. The current join reordering rule
>> is just a stats based optimizer rule. Either top-down or bottom-up
>> optimization can achieve the same-level optimized plans. DB2 is using
>> bottom up. In the future, we plan to move the stats based join reordering
>> rule to the cost-based planner, which is the right place of this rule based
>> on the original design of Spark SQL.
>>
>> Actually, building a good cost model is much more difficult than
>> implementing such a classic framework, especially when Spark does not own
>> the data. Also, we need to compute incremental stats instead of always
>> recomputing the stats.
>>
>> Cheers,
>>
>> Xiao
>>
>>
>>
>> 吴晓菊 <ch...@gmail.com> 于2018年9月24日周一 下午7:53写道:
>>
>>> Hi All,
>>>
>>> Current Spark CBO implements a cost based multi-way join reordering
>>> algorithm based on the System-R’s paper [Access Path-SIGMOD’79]
>>> <http://x86.cs.duke.edu/courses/spring03/cps216/papers/selinger-etal-1979.pdf>.
>>> When building m-way joins, it uses a bottom-up approach and put all items
>>> (basic joined nodes) into level 0, then build all two-way joins at level 1
>>> from plans at level 0 (single items), then build all 3-way joins ... etc.
>>> The algorithm also considers all combinations including left-deep trees,
>>> bushy trees, and right-deep-trees. It also prunes cartesian product
>>> candidates.
>>>
>>> While we still found many *limitations* of current CBO implementation:
>>> 1. The current CBO is a rule in logic phase, it only outputs one logical
>>> plan to physical phase optimize, while we cannot make sure the best plan in
>>> logical phase is still the best after physical optimize.
>>>
>>> 2. In current bottom-up approach, we keeps only one best plan for each
>>> level, while we cannot make sure to get the exact best plan for all from
>>> the best plan for each level.
>>>
>>> 3. Current cost formula cost = weight * cardinality + (1.0 - weight) *
>>> size from which the first portion roughly corresponds to the CPU cost
>>> and the second portion roughly corresponds to the I/O cost. The cost
>>> formula is over simplified and. It treats all the join implementations
>>> the same way and doesn't take shuffle and sort cost into consideration,
>>> while Shuffle Exchange is one of the heaviest physical operator in
>>> Spark SQL.
>>>
>>> 4. Equivalent join conditions are not supported. For example, (A join B
>>> join C on a=b and b=c) can be reordered to (A join C join B on a=c and c=b)
>>> which is possible to be more efficient. While in current implementation, we
>>> will not get condition "a=c" so will take "A join C" like a Cartesian
>>> Product and then exclude it.
>>>
>>> The bottom-up approach first came up from the System-R optimizer (1979). It
>>> quickly became a standard and many of the modern relation database
>>> optimizers are “System-R style”, for example, Oracle, PostgreSQL, MySQL,
>>> DB2.
>>>
>>> As time goes by, new styles optimizer were invented: Volcano(1993) and
>>> Cascades(1995). They are not that famous compared to System-R but still be
>>> wildly used in practice: Microsoft SQL Server, Greenplum Orca, Apache
>>> Calcite. They implement Top-down transformational search algorithm and
>>> provide extensible optimization framework.
>>>
>>> A top-down optimization framework can help us solve the above
>>> limitations since it has a more complete search space and combines the
>>> logical and physical phases to have a more accurate cost estimation. And
>>> about the efficiency of having all alternatives plans, Cascades also
>>> provides pruning to save the search space.
>>>
>>> What about implementing *a new Cascades style CBO for Spark SQL*?
>>> It could be a new rule in current "Planner" which reads a logical plan
>>> after heuristics rules and outputs a best physical plan with least cost
>>> after reorder and physical implementation rules.
>>>
>>> Xiaoju Wu
>>> Phone:+86 17717640807
>>>
>>>