You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/31 07:55:39 UTC

[GitHub] [spark] HyukjinKwon commented on a change in pull request #30965: [SPARK-33935][SQL] Fix CBO cost function

HyukjinKwon commented on a change in pull request #30965:
URL: https://github.com/apache/spark/pull/30965#discussion_r604675022



##########
File path: sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q19.sf100/simplified.txt
##########
@@ -6,71 +6,71 @@ TakeOrderedAndProject [ext_price,brand,brand_id,i_manufact_id,i_manufact]
           WholeStageCodegen (12)
             HashAggregate [i_brand,i_brand_id,i_manufact_id,i_manufact,ss_ext_sales_price] [sum,sum]
               Project [ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact]
-                SortMergeJoin [ss_customer_sk,c_customer_sk,ca_zip,s_zip]
-                  InputAdapter
-                    WholeStageCodegen (5)
-                      Sort [ss_customer_sk]
-                        InputAdapter
-                          Exchange [ss_customer_sk] #2
-                            WholeStageCodegen (4)
-                              Project [ss_customer_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact,s_zip]
-                                BroadcastHashJoin [ss_store_sk,s_store_sk]
-                                  Project [ss_customer_sk,ss_store_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact]
-                                    BroadcastHashJoin [ss_sold_date_sk,d_date_sk]
-                                      Project [ss_sold_date_sk,ss_customer_sk,ss_store_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact]
-                                        BroadcastHashJoin [ss_item_sk,i_item_sk]
-                                          Filter [ss_sold_date_sk,ss_item_sk,ss_customer_sk,ss_store_sk]
-                                            ColumnarToRow
-                                              InputAdapter
-                                                Scan parquet default.store_sales [ss_sold_date_sk,ss_item_sk,ss_customer_sk,ss_store_sk,ss_ext_sales_price]
+                BroadcastHashJoin [ss_item_sk,i_item_sk]

Review comment:
       This actually caused a significant regression at q19 in TPC-DS benchmark (performed internally). Can we just revert it for now, and do it with actual performance numbers? I think that's safer and easier for everybody here. Seems like At least the plan change here is overlooked and the performance has to be clarified.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org