You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/01/02 19:57:51 UTC

[GitHub] [spark] tanelk commented on a change in pull request #30996: [Spark-33964][SQL] Combine distinct unions in more cases

tanelk commented on a change in pull request #30996:
URL: https://github.com/apache/spark/pull/30996#discussion_r550915466



##########
File path: sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q75.sf100/explain.txt
##########
@@ -327,426 +319,386 @@ Join condition: None
 Output [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, (ss_quantity#27 - coalesce(sr_return_quantity#32, 0)) AS sales_cnt#35, CheckOverflow((promote_precision(cast(ss_ext_sales_price#28 as decimal(8,2))) - promote_precision(cast(coalesce(sr_return_amt#33, 0.00) as decimal(8,2)))), DecimalType(8,2), true) AS sales_amt#36]
 Input [13]: [ss_item_sk#25, ss_ticket_number#26, ss_quantity#27, ss_ext_sales_price#28, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, d_year#14, sr_item_sk#30, sr_ticket_number#31, sr_return_quantity#32, sr_return_amt#33]
 
-(44) Union
-
-(45) HashAggregate [codegen id : 15]
-Input [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#22, sales_amt#23]
-Keys [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#22, sales_amt#23]
-Functions: []
-Aggregate Attributes: []
-Results [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#22, sales_amt#23]
-
-(46) Exchange
-Input [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#22, sales_amt#23]
-Arguments: hashpartitioning(d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#22, sales_amt#23, 5), true, [id=#37]
-
-(47) HashAggregate [codegen id : 16]
-Input [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#22, sales_amt#23]
-Keys [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#22, sales_amt#23]
-Functions: []
-Aggregate Attributes: []
-Results [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#22, sales_amt#23]
-
-(48) Scan parquet default.web_sales
-Output [5]: [ws_sold_date_sk#38, ws_item_sk#39, ws_order_number#40, ws_quantity#41, ws_ext_sales_price#42]
+(44) Scan parquet default.web_sales
+Output [5]: [ws_sold_date_sk#37, ws_item_sk#38, ws_order_number#39, ws_quantity#40, ws_ext_sales_price#41]
 Batched: true
 Location [not included in comparison]/{warehouse_dir}/web_sales]
 PushedFilters: [IsNotNull(ws_item_sk), IsNotNull(ws_sold_date_sk)]
 ReadSchema: struct<ws_sold_date_sk:int,ws_item_sk:int,ws_order_number:int,ws_quantity:int,ws_ext_sales_price:decimal(7,2)>
 
-(49) ColumnarToRow [codegen id : 19]
-Input [5]: [ws_sold_date_sk#38, ws_item_sk#39, ws_order_number#40, ws_quantity#41, ws_ext_sales_price#42]
+(45) ColumnarToRow [codegen id : 17]
+Input [5]: [ws_sold_date_sk#37, ws_item_sk#38, ws_order_number#39, ws_quantity#40, ws_ext_sales_price#41]
 
-(50) Filter [codegen id : 19]
-Input [5]: [ws_sold_date_sk#38, ws_item_sk#39, ws_order_number#40, ws_quantity#41, ws_ext_sales_price#42]
-Condition : (isnotnull(ws_item_sk#39) AND isnotnull(ws_sold_date_sk#38))
+(46) Filter [codegen id : 17]
+Input [5]: [ws_sold_date_sk#37, ws_item_sk#38, ws_order_number#39, ws_quantity#40, ws_ext_sales_price#41]
+Condition : (isnotnull(ws_item_sk#38) AND isnotnull(ws_sold_date_sk#37))
 
-(51) ReusedExchange [Reuses operator id: 8]
+(47) ReusedExchange [Reuses operator id: 8]
 Output [5]: [i_item_sk#6, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11]
 
-(52) BroadcastHashJoin [codegen id : 19]
-Left keys [1]: [ws_item_sk#39]
+(48) BroadcastHashJoin [codegen id : 17]
+Left keys [1]: [ws_item_sk#38]
 Right keys [1]: [i_item_sk#6]
 Join condition: None
 
-(53) Project [codegen id : 19]
-Output [9]: [ws_sold_date_sk#38, ws_item_sk#39, ws_order_number#40, ws_quantity#41, ws_ext_sales_price#42, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11]
-Input [10]: [ws_sold_date_sk#38, ws_item_sk#39, ws_order_number#40, ws_quantity#41, ws_ext_sales_price#42, i_item_sk#6, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11]
+(49) Project [codegen id : 17]
+Output [9]: [ws_sold_date_sk#37, ws_item_sk#38, ws_order_number#39, ws_quantity#40, ws_ext_sales_price#41, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11]
+Input [10]: [ws_sold_date_sk#37, ws_item_sk#38, ws_order_number#39, ws_quantity#40, ws_ext_sales_price#41, i_item_sk#6, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11]
 
-(54) ReusedExchange [Reuses operator id: 14]
+(50) ReusedExchange [Reuses operator id: 14]
 Output [2]: [d_date_sk#13, d_year#14]
 
-(55) BroadcastHashJoin [codegen id : 19]
-Left keys [1]: [ws_sold_date_sk#38]
+(51) BroadcastHashJoin [codegen id : 17]
+Left keys [1]: [ws_sold_date_sk#37]
 Right keys [1]: [d_date_sk#13]
 Join condition: None
 
-(56) Project [codegen id : 19]
-Output [9]: [ws_item_sk#39, ws_order_number#40, ws_quantity#41, ws_ext_sales_price#42, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, d_year#14]
-Input [11]: [ws_sold_date_sk#38, ws_item_sk#39, ws_order_number#40, ws_quantity#41, ws_ext_sales_price#42, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, d_date_sk#13, d_year#14]
+(52) Project [codegen id : 17]
+Output [9]: [ws_item_sk#38, ws_order_number#39, ws_quantity#40, ws_ext_sales_price#41, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, d_year#14]
+Input [11]: [ws_sold_date_sk#37, ws_item_sk#38, ws_order_number#39, ws_quantity#40, ws_ext_sales_price#41, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, d_date_sk#13, d_year#14]
 
-(57) Exchange
-Input [9]: [ws_item_sk#39, ws_order_number#40, ws_quantity#41, ws_ext_sales_price#42, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, d_year#14]
-Arguments: hashpartitioning(cast(ws_order_number#40 as bigint), cast(ws_item_sk#39 as bigint), 5), true, [id=#43]
+(53) Exchange
+Input [9]: [ws_item_sk#38, ws_order_number#39, ws_quantity#40, ws_ext_sales_price#41, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, d_year#14]
+Arguments: hashpartitioning(cast(ws_order_number#39 as bigint), cast(ws_item_sk#38 as bigint), 5), ENSURE_REQUIREMENTS, [id=#42]
 
-(58) Sort [codegen id : 20]
-Input [9]: [ws_item_sk#39, ws_order_number#40, ws_quantity#41, ws_ext_sales_price#42, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, d_year#14]
-Arguments: [cast(ws_order_number#40 as bigint) ASC NULLS FIRST, cast(ws_item_sk#39 as bigint) ASC NULLS FIRST], false, 0
+(54) Sort [codegen id : 18]
+Input [9]: [ws_item_sk#38, ws_order_number#39, ws_quantity#40, ws_ext_sales_price#41, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, d_year#14]
+Arguments: [cast(ws_order_number#39 as bigint) ASC NULLS FIRST, cast(ws_item_sk#38 as bigint) ASC NULLS FIRST], false, 0
 
-(59) Scan parquet default.web_returns
-Output [4]: [wr_item_sk#44, wr_order_number#45, wr_return_quantity#46, wr_return_amt#47]
+(55) Scan parquet default.web_returns
+Output [4]: [wr_item_sk#43, wr_order_number#44, wr_return_quantity#45, wr_return_amt#46]
 Batched: true
 Location [not included in comparison]/{warehouse_dir}/web_returns]
 PushedFilters: [IsNotNull(wr_order_number), IsNotNull(wr_item_sk)]
 ReadSchema: struct<wr_item_sk:bigint,wr_order_number:bigint,wr_return_quantity:int,wr_return_amt:decimal(7,2)>
 
-(60) ColumnarToRow [codegen id : 21]
-Input [4]: [wr_item_sk#44, wr_order_number#45, wr_return_quantity#46, wr_return_amt#47]
+(56) ColumnarToRow [codegen id : 19]
+Input [4]: [wr_item_sk#43, wr_order_number#44, wr_return_quantity#45, wr_return_amt#46]
 
-(61) Filter [codegen id : 21]
-Input [4]: [wr_item_sk#44, wr_order_number#45, wr_return_quantity#46, wr_return_amt#47]
-Condition : (isnotnull(wr_order_number#45) AND isnotnull(wr_item_sk#44))
+(57) Filter [codegen id : 19]
+Input [4]: [wr_item_sk#43, wr_order_number#44, wr_return_quantity#45, wr_return_amt#46]
+Condition : (isnotnull(wr_order_number#44) AND isnotnull(wr_item_sk#43))
 
-(62) Exchange
-Input [4]: [wr_item_sk#44, wr_order_number#45, wr_return_quantity#46, wr_return_amt#47]
-Arguments: hashpartitioning(wr_order_number#45, wr_item_sk#44, 5), true, [id=#48]
+(58) Exchange
+Input [4]: [wr_item_sk#43, wr_order_number#44, wr_return_quantity#45, wr_return_amt#46]
+Arguments: hashpartitioning(wr_order_number#44, wr_item_sk#43, 5), ENSURE_REQUIREMENTS, [id=#47]
 
-(63) Sort [codegen id : 22]
-Input [4]: [wr_item_sk#44, wr_order_number#45, wr_return_quantity#46, wr_return_amt#47]
-Arguments: [wr_order_number#45 ASC NULLS FIRST, wr_item_sk#44 ASC NULLS FIRST], false, 0
+(59) Sort [codegen id : 20]
+Input [4]: [wr_item_sk#43, wr_order_number#44, wr_return_quantity#45, wr_return_amt#46]
+Arguments: [wr_order_number#44 ASC NULLS FIRST, wr_item_sk#43 ASC NULLS FIRST], false, 0
 
-(64) SortMergeJoin
-Left keys [2]: [cast(ws_order_number#40 as bigint), cast(ws_item_sk#39 as bigint)]
-Right keys [2]: [wr_order_number#45, wr_item_sk#44]
+(60) SortMergeJoin
+Left keys [2]: [cast(ws_order_number#39 as bigint), cast(ws_item_sk#38 as bigint)]
+Right keys [2]: [wr_order_number#44, wr_item_sk#43]
 Join condition: None
 
-(65) Project [codegen id : 23]
-Output [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, (ws_quantity#41 - coalesce(wr_return_quantity#46, 0)) AS sales_cnt#49, CheckOverflow((promote_precision(cast(ws_ext_sales_price#42 as decimal(8,2))) - promote_precision(cast(coalesce(wr_return_amt#47, 0.00) as decimal(8,2)))), DecimalType(8,2), true) AS sales_amt#50]
-Input [13]: [ws_item_sk#39, ws_order_number#40, ws_quantity#41, ws_ext_sales_price#42, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, d_year#14, wr_item_sk#44, wr_order_number#45, wr_return_quantity#46, wr_return_amt#47]
+(61) Project [codegen id : 21]
+Output [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, (ws_quantity#40 - coalesce(wr_return_quantity#45, 0)) AS sales_cnt#48, CheckOverflow((promote_precision(cast(ws_ext_sales_price#41 as decimal(8,2))) - promote_precision(cast(coalesce(wr_return_amt#46, 0.00) as decimal(8,2)))), DecimalType(8,2), true) AS sales_amt#49]
+Input [13]: [ws_item_sk#38, ws_order_number#39, ws_quantity#40, ws_ext_sales_price#41, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, d_year#14, wr_item_sk#43, wr_order_number#44, wr_return_quantity#45, wr_return_amt#46]
 
-(66) Union
+(62) Union
 
-(67) HashAggregate [codegen id : 24]
+(63) HashAggregate [codegen id : 22]
 Input [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#22, sales_amt#23]
 Keys [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#22, sales_amt#23]
 Functions: []
 Aggregate Attributes: []
 Results [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#22, sales_amt#23]
 
-(68) Exchange
+(64) Exchange
 Input [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#22, sales_amt#23]
-Arguments: hashpartitioning(d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#22, sales_amt#23, 5), true, [id=#51]
+Arguments: hashpartitioning(d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#22, sales_amt#23, 5), ENSURE_REQUIREMENTS, [id=#50]
 
-(69) HashAggregate [codegen id : 25]
+(65) HashAggregate [codegen id : 23]
 Input [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#22, sales_amt#23]
 Keys [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#22, sales_amt#23]
 Functions: []
 Aggregate Attributes: []
 Results [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#22, sales_amt#23]
 
-(70) HashAggregate [codegen id : 25]
+(66) HashAggregate [codegen id : 23]
 Input [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#22, sales_amt#23]
 Keys [5]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11]
 Functions [2]: [partial_sum(cast(sales_cnt#22 as bigint)), partial_sum(UnscaledValue(sales_amt#23))]
-Aggregate Attributes [2]: [sum#52, sum#53]
-Results [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sum#54, sum#55]
+Aggregate Attributes [2]: [sum#51, sum#52]
+Results [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sum#53, sum#54]
 
-(71) Exchange
-Input [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sum#54, sum#55]
-Arguments: hashpartitioning(d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, 5), true, [id=#56]
+(67) Exchange
+Input [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sum#53, sum#54]
+Arguments: hashpartitioning(d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, 5), ENSURE_REQUIREMENTS, [id=#55]
 
-(72) HashAggregate [codegen id : 26]
-Input [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sum#54, sum#55]
+(68) HashAggregate [codegen id : 24]
+Input [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sum#53, sum#54]
 Keys [5]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11]
 Functions [2]: [sum(cast(sales_cnt#22 as bigint)), sum(UnscaledValue(sales_amt#23))]
-Aggregate Attributes [2]: [sum(cast(sales_cnt#22 as bigint))#57, sum(UnscaledValue(sales_amt#23))#58]
-Results [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sum(cast(sales_cnt#22 as bigint))#57 AS sales_cnt#59, MakeDecimal(sum(UnscaledValue(sales_amt#23))#58,18,2) AS sales_amt#60]
+Aggregate Attributes [2]: [sum(cast(sales_cnt#22 as bigint))#56, sum(UnscaledValue(sales_amt#23))#57]
+Results [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sum(cast(sales_cnt#22 as bigint))#56 AS sales_cnt#58, MakeDecimal(sum(UnscaledValue(sales_amt#23))#57,18,2) AS sales_amt#59]
 
-(73) Exchange
-Input [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#59, sales_amt#60]
-Arguments: hashpartitioning(i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, 5), true, [id=#61]
+(69) Exchange
+Input [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#58, sales_amt#59]
+Arguments: hashpartitioning(i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, 5), ENSURE_REQUIREMENTS, [id=#60]
 
-(74) Sort [codegen id : 27]
-Input [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#59, sales_amt#60]
+(70) Sort [codegen id : 25]
+Input [7]: [d_year#14, i_brand_id#7, i_class_id#8, i_category_id#9, i_manufact_id#11, sales_cnt#58, sales_amt#59]
 Arguments: [i_brand_id#7 ASC NULLS FIRST, i_class_id#8 ASC NULLS FIRST, i_category_id#9 ASC NULLS FIRST, i_manufact_id#11 ASC NULLS FIRST], false, 0
 
-(75) Scan parquet default.catalog_sales
+(71) Scan parquet default.catalog_sales
 Output [5]: [cs_sold_date_sk#1, cs_item_sk#2, cs_order_number#3, cs_quantity#4, cs_ext_sales_price#5]
 Batched: true
 Location [not included in comparison]/{warehouse_dir}/catalog_sales]
 PushedFilters: [IsNotNull(cs_item_sk), IsNotNull(cs_sold_date_sk)]
 ReadSchema: struct<cs_sold_date_sk:int,cs_item_sk:int,cs_order_number:int,cs_quantity:int,cs_ext_sales_price:decimal(7,2)>
 
-(76) ColumnarToRow [codegen id : 30]
+(72) ColumnarToRow [codegen id : 28]
 Input [5]: [cs_sold_date_sk#1, cs_item_sk#2, cs_order_number#3, cs_quantity#4, cs_ext_sales_price#5]
 
-(77) Filter [codegen id : 30]
+(73) Filter [codegen id : 28]
 Input [5]: [cs_sold_date_sk#1, cs_item_sk#2, cs_order_number#3, cs_quantity#4, cs_ext_sales_price#5]
 Condition : (isnotnull(cs_item_sk#2) AND isnotnull(cs_sold_date_sk#1))
 
-(78) ReusedExchange [Reuses operator id: 8]
-Output [5]: [i_item_sk#62, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66]
+(74) ReusedExchange [Reuses operator id: 8]
+Output [5]: [i_item_sk#61, i_brand_id#62, i_class_id#63, i_category_id#64, i_manufact_id#65]
 
-(79) BroadcastHashJoin [codegen id : 30]
+(75) BroadcastHashJoin [codegen id : 28]
 Left keys [1]: [cs_item_sk#2]
-Right keys [1]: [i_item_sk#62]
+Right keys [1]: [i_item_sk#61]
 Join condition: None
 
-(80) Project [codegen id : 30]
-Output [9]: [cs_sold_date_sk#1, cs_item_sk#2, cs_order_number#3, cs_quantity#4, cs_ext_sales_price#5, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66]
-Input [10]: [cs_sold_date_sk#1, cs_item_sk#2, cs_order_number#3, cs_quantity#4, cs_ext_sales_price#5, i_item_sk#62, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66]
+(76) Project [codegen id : 28]
+Output [9]: [cs_sold_date_sk#1, cs_item_sk#2, cs_order_number#3, cs_quantity#4, cs_ext_sales_price#5, i_brand_id#62, i_class_id#63, i_category_id#64, i_manufact_id#65]
+Input [10]: [cs_sold_date_sk#1, cs_item_sk#2, cs_order_number#3, cs_quantity#4, cs_ext_sales_price#5, i_item_sk#61, i_brand_id#62, i_class_id#63, i_category_id#64, i_manufact_id#65]
 
-(81) Scan parquet default.date_dim
-Output [2]: [d_date_sk#67, d_year#68]
+(77) Scan parquet default.date_dim
+Output [2]: [d_date_sk#66, d_year#67]
 Batched: true
 Location [not included in comparison]/{warehouse_dir}/date_dim]
 PushedFilters: [IsNotNull(d_year), EqualTo(d_year,2001), IsNotNull(d_date_sk)]
 ReadSchema: struct<d_date_sk:int,d_year:int>
 
-(82) ColumnarToRow [codegen id : 29]
-Input [2]: [d_date_sk#67, d_year#68]
+(78) ColumnarToRow [codegen id : 27]
+Input [2]: [d_date_sk#66, d_year#67]
 
-(83) Filter [codegen id : 29]
-Input [2]: [d_date_sk#67, d_year#68]
-Condition : ((isnotnull(d_year#68) AND (d_year#68 = 2001)) AND isnotnull(d_date_sk#67))
+(79) Filter [codegen id : 27]
+Input [2]: [d_date_sk#66, d_year#67]
+Condition : ((isnotnull(d_year#67) AND (d_year#67 = 2001)) AND isnotnull(d_date_sk#66))
 
-(84) BroadcastExchange
-Input [2]: [d_date_sk#67, d_year#68]
-Arguments: HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [id=#69]
+(80) BroadcastExchange
+Input [2]: [d_date_sk#66, d_year#67]
+Arguments: HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [id=#68]
 
-(85) BroadcastHashJoin [codegen id : 30]
+(81) BroadcastHashJoin [codegen id : 28]
 Left keys [1]: [cs_sold_date_sk#1]
-Right keys [1]: [d_date_sk#67]
+Right keys [1]: [d_date_sk#66]
 Join condition: None
 
-(86) Project [codegen id : 30]
-Output [9]: [cs_item_sk#2, cs_order_number#3, cs_quantity#4, cs_ext_sales_price#5, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66, d_year#68]
-Input [11]: [cs_sold_date_sk#1, cs_item_sk#2, cs_order_number#3, cs_quantity#4, cs_ext_sales_price#5, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66, d_date_sk#67, d_year#68]
+(82) Project [codegen id : 28]
+Output [9]: [cs_item_sk#2, cs_order_number#3, cs_quantity#4, cs_ext_sales_price#5, i_brand_id#62, i_class_id#63, i_category_id#64, i_manufact_id#65, d_year#67]
+Input [11]: [cs_sold_date_sk#1, cs_item_sk#2, cs_order_number#3, cs_quantity#4, cs_ext_sales_price#5, i_brand_id#62, i_class_id#63, i_category_id#64, i_manufact_id#65, d_date_sk#66, d_year#67]
 
-(87) Exchange
-Input [9]: [cs_item_sk#2, cs_order_number#3, cs_quantity#4, cs_ext_sales_price#5, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66, d_year#68]
-Arguments: hashpartitioning(cs_order_number#3, cs_item_sk#2, 5), true, [id=#70]
+(83) Exchange
+Input [9]: [cs_item_sk#2, cs_order_number#3, cs_quantity#4, cs_ext_sales_price#5, i_brand_id#62, i_class_id#63, i_category_id#64, i_manufact_id#65, d_year#67]
+Arguments: hashpartitioning(cs_order_number#3, cs_item_sk#2, 5), ENSURE_REQUIREMENTS, [id=#69]
 
-(88) Sort [codegen id : 31]
-Input [9]: [cs_item_sk#2, cs_order_number#3, cs_quantity#4, cs_ext_sales_price#5, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66, d_year#68]
+(84) Sort [codegen id : 29]
+Input [9]: [cs_item_sk#2, cs_order_number#3, cs_quantity#4, cs_ext_sales_price#5, i_brand_id#62, i_class_id#63, i_category_id#64, i_manufact_id#65, d_year#67]
 Arguments: [cs_order_number#3 ASC NULLS FIRST, cs_item_sk#2 ASC NULLS FIRST], false, 0
 
-(89) ReusedExchange [Reuses operator id: 22]
+(85) ReusedExchange [Reuses operator id: 22]
 Output [4]: [cr_item_sk#17, cr_order_number#18, cr_return_quantity#19, cr_return_amount#20]
 
-(90) Sort [codegen id : 33]
+(86) Sort [codegen id : 31]
 Input [4]: [cr_item_sk#17, cr_order_number#18, cr_return_quantity#19, cr_return_amount#20]
 Arguments: [cr_order_number#18 ASC NULLS FIRST, cr_item_sk#17 ASC NULLS FIRST], false, 0
 
-(91) SortMergeJoin
+(87) SortMergeJoin
 Left keys [2]: [cs_order_number#3, cs_item_sk#2]
 Right keys [2]: [cr_order_number#18, cr_item_sk#17]
 Join condition: None
 
-(92) Project [codegen id : 34]
-Output [7]: [d_year#68, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66, (cs_quantity#4 - coalesce(cr_return_quantity#19, 0)) AS sales_cnt#22, CheckOverflow((promote_precision(cast(cs_ext_sales_price#5 as decimal(8,2))) - promote_precision(cast(coalesce(cr_return_amount#20, 0.00) as decimal(8,2)))), DecimalType(8,2), true) AS sales_amt#23]
-Input [13]: [cs_item_sk#2, cs_order_number#3, cs_quantity#4, cs_ext_sales_price#5, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66, d_year#68, cr_item_sk#17, cr_order_number#18, cr_return_quantity#19, cr_return_amount#20]
+(88) Project [codegen id : 32]
+Output [7]: [d_year#67, i_brand_id#62, i_class_id#63, i_category_id#64, i_manufact_id#65, (cs_quantity#4 - coalesce(cr_return_quantity#19, 0)) AS sales_cnt#22, CheckOverflow((promote_precision(cast(cs_ext_sales_price#5 as decimal(8,2))) - promote_precision(cast(coalesce(cr_return_amount#20, 0.00) as decimal(8,2)))), DecimalType(8,2), true) AS sales_amt#23]
+Input [13]: [cs_item_sk#2, cs_order_number#3, cs_quantity#4, cs_ext_sales_price#5, i_brand_id#62, i_class_id#63, i_category_id#64, i_manufact_id#65, d_year#67, cr_item_sk#17, cr_order_number#18, cr_return_quantity#19, cr_return_amount#20]
 
-(93) Scan parquet default.store_sales
+(89) Scan parquet default.store_sales
 Output [5]: [ss_sold_date_sk#24, ss_item_sk#25, ss_ticket_number#26, ss_quantity#27, ss_ext_sales_price#28]
 Batched: true
 Location [not included in comparison]/{warehouse_dir}/store_sales]
 PushedFilters: [IsNotNull(ss_item_sk), IsNotNull(ss_sold_date_sk)]
 ReadSchema: struct<ss_sold_date_sk:int,ss_item_sk:int,ss_ticket_number:int,ss_quantity:int,ss_ext_sales_price:decimal(7,2)>
 
-(94) ColumnarToRow [codegen id : 37]
+(90) ColumnarToRow [codegen id : 35]
 Input [5]: [ss_sold_date_sk#24, ss_item_sk#25, ss_ticket_number#26, ss_quantity#27, ss_ext_sales_price#28]
 
-(95) Filter [codegen id : 37]
+(91) Filter [codegen id : 35]
 Input [5]: [ss_sold_date_sk#24, ss_item_sk#25, ss_ticket_number#26, ss_quantity#27, ss_ext_sales_price#28]
 Condition : (isnotnull(ss_item_sk#25) AND isnotnull(ss_sold_date_sk#24))
 
-(96) ReusedExchange [Reuses operator id: 8]
-Output [5]: [i_item_sk#62, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66]
+(92) ReusedExchange [Reuses operator id: 8]
+Output [5]: [i_item_sk#61, i_brand_id#62, i_class_id#63, i_category_id#64, i_manufact_id#65]
 
-(97) BroadcastHashJoin [codegen id : 37]
+(93) BroadcastHashJoin [codegen id : 35]
 Left keys [1]: [ss_item_sk#25]
-Right keys [1]: [i_item_sk#62]
+Right keys [1]: [i_item_sk#61]
 Join condition: None
 
-(98) Project [codegen id : 37]
-Output [9]: [ss_sold_date_sk#24, ss_item_sk#25, ss_ticket_number#26, ss_quantity#27, ss_ext_sales_price#28, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66]
-Input [10]: [ss_sold_date_sk#24, ss_item_sk#25, ss_ticket_number#26, ss_quantity#27, ss_ext_sales_price#28, i_item_sk#62, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66]
+(94) Project [codegen id : 35]
+Output [9]: [ss_sold_date_sk#24, ss_item_sk#25, ss_ticket_number#26, ss_quantity#27, ss_ext_sales_price#28, i_brand_id#62, i_class_id#63, i_category_id#64, i_manufact_id#65]
+Input [10]: [ss_sold_date_sk#24, ss_item_sk#25, ss_ticket_number#26, ss_quantity#27, ss_ext_sales_price#28, i_item_sk#61, i_brand_id#62, i_class_id#63, i_category_id#64, i_manufact_id#65]
 
-(99) ReusedExchange [Reuses operator id: 84]
-Output [2]: [d_date_sk#67, d_year#68]
+(95) ReusedExchange [Reuses operator id: 80]
+Output [2]: [d_date_sk#66, d_year#67]
 
-(100) BroadcastHashJoin [codegen id : 37]
+(96) BroadcastHashJoin [codegen id : 35]
 Left keys [1]: [ss_sold_date_sk#24]
-Right keys [1]: [d_date_sk#67]
+Right keys [1]: [d_date_sk#66]
 Join condition: None
 
-(101) Project [codegen id : 37]
-Output [9]: [ss_item_sk#25, ss_ticket_number#26, ss_quantity#27, ss_ext_sales_price#28, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66, d_year#68]
-Input [11]: [ss_sold_date_sk#24, ss_item_sk#25, ss_ticket_number#26, ss_quantity#27, ss_ext_sales_price#28, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66, d_date_sk#67, d_year#68]
+(97) Project [codegen id : 35]
+Output [9]: [ss_item_sk#25, ss_ticket_number#26, ss_quantity#27, ss_ext_sales_price#28, i_brand_id#62, i_class_id#63, i_category_id#64, i_manufact_id#65, d_year#67]
+Input [11]: [ss_sold_date_sk#24, ss_item_sk#25, ss_ticket_number#26, ss_quantity#27, ss_ext_sales_price#28, i_brand_id#62, i_class_id#63, i_category_id#64, i_manufact_id#65, d_date_sk#66, d_year#67]
 
-(102) Exchange
-Input [9]: [ss_item_sk#25, ss_ticket_number#26, ss_quantity#27, ss_ext_sales_price#28, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66, d_year#68]
-Arguments: hashpartitioning(cast(ss_ticket_number#26 as bigint), cast(ss_item_sk#25 as bigint), 5), true, [id=#71]
+(98) Exchange
+Input [9]: [ss_item_sk#25, ss_ticket_number#26, ss_quantity#27, ss_ext_sales_price#28, i_brand_id#62, i_class_id#63, i_category_id#64, i_manufact_id#65, d_year#67]
+Arguments: hashpartitioning(cast(ss_ticket_number#26 as bigint), cast(ss_item_sk#25 as bigint), 5), ENSURE_REQUIREMENTS, [id=#70]
 
-(103) Sort [codegen id : 38]
-Input [9]: [ss_item_sk#25, ss_ticket_number#26, ss_quantity#27, ss_ext_sales_price#28, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66, d_year#68]
+(99) Sort [codegen id : 36]
+Input [9]: [ss_item_sk#25, ss_ticket_number#26, ss_quantity#27, ss_ext_sales_price#28, i_brand_id#62, i_class_id#63, i_category_id#64, i_manufact_id#65, d_year#67]
 Arguments: [cast(ss_ticket_number#26 as bigint) ASC NULLS FIRST, cast(ss_item_sk#25 as bigint) ASC NULLS FIRST], false, 0
 
-(104) ReusedExchange [Reuses operator id: 40]
+(100) ReusedExchange [Reuses operator id: 40]
 Output [4]: [sr_item_sk#30, sr_ticket_number#31, sr_return_quantity#32, sr_return_amt#33]
 
-(105) Sort [codegen id : 40]
+(101) Sort [codegen id : 38]
 Input [4]: [sr_item_sk#30, sr_ticket_number#31, sr_return_quantity#32, sr_return_amt#33]
 Arguments: [sr_ticket_number#31 ASC NULLS FIRST, sr_item_sk#30 ASC NULLS FIRST], false, 0
 
-(106) SortMergeJoin
+(102) SortMergeJoin
 Left keys [2]: [cast(ss_ticket_number#26 as bigint), cast(ss_item_sk#25 as bigint)]
 Right keys [2]: [sr_ticket_number#31, sr_item_sk#30]
 Join condition: None
 
-(107) Project [codegen id : 41]
-Output [7]: [d_year#68, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66, (ss_quantity#27 - coalesce(sr_return_quantity#32, 0)) AS sales_cnt#72, CheckOverflow((promote_precision(cast(ss_ext_sales_price#28 as decimal(8,2))) - promote_precision(cast(coalesce(sr_return_amt#33, 0.00) as decimal(8,2)))), DecimalType(8,2), true) AS sales_amt#73]
-Input [13]: [ss_item_sk#25, ss_ticket_number#26, ss_quantity#27, ss_ext_sales_price#28, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66, d_year#68, sr_item_sk#30, sr_ticket_number#31, sr_return_quantity#32, sr_return_amt#33]
-
-(108) Union
-
-(109) HashAggregate [codegen id : 42]
-Input [7]: [d_year#68, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66, sales_cnt#22, sales_amt#23]
-Keys [7]: [d_year#68, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66, sales_cnt#22, sales_amt#23]
-Functions: []
-Aggregate Attributes: []
-Results [7]: [d_year#68, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66, sales_cnt#22, sales_amt#23]
-
-(110) Exchange
-Input [7]: [d_year#68, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66, sales_cnt#22, sales_amt#23]
-Arguments: hashpartitioning(d_year#68, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66, sales_cnt#22, sales_amt#23, 5), true, [id=#74]
-
-(111) HashAggregate [codegen id : 43]
-Input [7]: [d_year#68, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66, sales_cnt#22, sales_amt#23]
-Keys [7]: [d_year#68, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66, sales_cnt#22, sales_amt#23]
-Functions: []
-Aggregate Attributes: []
-Results [7]: [d_year#68, i_brand_id#63, i_class_id#64, i_category_id#65, i_manufact_id#66, sales_cnt#22, sales_amt#23]

Review comment:
       All the changes to the plans follow this same 4 node pattern:
   - The `Union` is merged with the one above it in the tree.
   - The duplicate `Distinct` (aggregate - exchange - aggregate) is removed and its job is done by the one above the merged `Union` 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org