You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "jiaan.geng (Jira)" <ji...@apache.org> on 2023/07/27 11:10:00 UTC

[jira] [Created] (SPARK-44571) Eliminate the Join by Combine multiple Aggregates

jiaan.geng created SPARK-44571:
----------------------------------

             Summary: Eliminate the Join by Combine multiple Aggregates
                 Key: SPARK-44571
                 URL: https://issues.apache.org/jira/browse/SPARK-44571
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 3.5.0
            Reporter: jiaan.geng


Recently, I investigate the test case q28 which is belong to the TPC-DS queries.

The query contains multiple scalar subquery with aggregation and connected with inner join.
If we can merge the filters and aggregates, we can scan data source only once and eliminate the join so as avoid shuffle. Obviously, this change will improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org