You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xianyin Xin (JIRA)" <ji...@apache.org> on 2019/06/17 03:06:00 UTC

[jira] [Commented] (SPARK-27714) Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12

    [ https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865272#comment-16865272 ] 

Xianyin Xin commented on SPARK-27714:
-------------------------------------

[~nkollar], sorry for the late reply. Yes, It's similar with the implementation in Postgres. However, It is not a replacement, or an alternative of current join reorder logic (DP), but a supplement of DP. DP is used when the number of joined table is small (<12 now in spark), while GA is used when the number of joined tables is large. Because as the number of joined table grows, DP would spend lots of time to find the best joined plan. GA can accelerates the "best plan searching" progress.

TPC-DS q64 is an example. Our experiment shows the executing time decreased from 1300+s to 200+s for 10TB TPC-DS q64, with a 18 nodes cluster.

> Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-27714
>                 URL: https://issues.apache.org/jira/browse/SPARK-27714
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Xianyin Xin
>            Priority: Major
>
> Now the join reorder logic is based on dynamic planning which can find the most optimized plan theoretically, but the searching cost grows rapidly with the # of joined tables grows. It would be better to introduce Genetic algorithm (GA) to overcome this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org