You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (Jira)" <ji...@apache.org> on 2023/03/16 03:13:00 UTC

[jira] [Comment Edited] (SPARK-42513) Push down topK through join

    [ https://issues.apache.org/jira/browse/SPARK-42513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700945#comment-17700945 ] 

Yuming Wang edited comment on SPARK-42513 at 3/16/23 3:12 AM:
--------------------------------------------------------------

TiDB({{mysql -u root  -h 127.0.0.1 -P 4000}}):

{code:sql}
create table t1(a int, b int) ;
create table t2(c int, d int) ;
create table t3(e int, f int);

desc select * from t1 left join t2 on a = c left join t3 on a = e order by b limit 1000;
{code}

{noformat}
mysql> desc select * from t1 left join t2 on a = c left join t3 on a = e order by b limit 1000;
+----------------------------------------+----------+-----------+---------------+-----------------------------------------------------+
| id                                     | estRows  | task      | access object | operator info                                       |
+----------------------------------------+----------+-----------+---------------+-----------------------------------------------------+
| TopN_17                                | 1000.00  | root      |               | test1.t1.b, offset:0, count:1000                    |
| └─HashJoin_23                          | 1250.00  | root      |               | left outer join, equal:[eq(test1.t1.a, test1.t3.e)] |
|   ├─TopN_26(Build)                     | 1000.00  | root      |               | test1.t1.b, offset:0, count:1000                    |
|   │ └─HashJoin_32                      | 1250.00  | root      |               | left outer join, equal:[eq(test1.t1.a, test1.t2.c)] |
|   │   ├─TopN_33(Build)                 | 1000.00  | root      |               | test1.t1.b, offset:0, count:1000                    |
|   │   │ └─TableReader_42               | 1000.00  | root      |               | data:TopN_41                                        |
|   │   │   └─TopN_41                    | 1000.00  | cop[tikv] |               | test1.t1.b, offset:0, count:1000                    |
|   │   │     └─TableFullScan_40         | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo                      |
|   │   └─TableReader_47(Probe)          | 9990.00  | root      |               | data:Selection_46                                   |
|   │     └─Selection_46                 | 9990.00  | cop[tikv] |               | not(isnull(test1.t2.c))                             |
|   │       └─TableFullScan_45           | 10000.00 | cop[tikv] | table:t2      | keep order:false, stats:pseudo                      |
|   └─TableReader_50(Probe)              | 9990.00  | root      |               | data:Selection_49                                   |
|     └─Selection_49                     | 9990.00  | cop[tikv] |               | not(isnull(test1.t3.e))                             |
|       └─TableFullScan_48               | 10000.00 | cop[tikv] | table:t3      | keep order:false, stats:pseudo                      |
+----------------------------------------+----------+-----------+---------------+-----------------------------------------------------+
14 rows in set (0.00 sec)
{noformat}




was (Author: q79969786):
TiDB:

{code:sql}
create table t1(a int, b int) ;
create table t2(c int, d int) ;
create table t3(e int, f int);

desc select * from t1 left join t2 on a = c left join t3 on a = e order by b limit 1000;
{code}

{noformat}
mysql> desc select * from t1 left join t2 on a = c left join t3 on a = e order by b limit 1000;
+----------------------------------------+----------+-----------+---------------+-----------------------------------------------------+
| id                                     | estRows  | task      | access object | operator info                                       |
+----------------------------------------+----------+-----------+---------------+-----------------------------------------------------+
| TopN_17                                | 1000.00  | root      |               | test1.t1.b, offset:0, count:1000                    |
| └─HashJoin_23                          | 1250.00  | root      |               | left outer join, equal:[eq(test1.t1.a, test1.t3.e)] |
|   ├─TopN_26(Build)                     | 1000.00  | root      |               | test1.t1.b, offset:0, count:1000                    |
|   │ └─HashJoin_32                      | 1250.00  | root      |               | left outer join, equal:[eq(test1.t1.a, test1.t2.c)] |
|   │   ├─TopN_33(Build)                 | 1000.00  | root      |               | test1.t1.b, offset:0, count:1000                    |
|   │   │ └─TableReader_42               | 1000.00  | root      |               | data:TopN_41                                        |
|   │   │   └─TopN_41                    | 1000.00  | cop[tikv] |               | test1.t1.b, offset:0, count:1000                    |
|   │   │     └─TableFullScan_40         | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo                      |
|   │   └─TableReader_47(Probe)          | 9990.00  | root      |               | data:Selection_46                                   |
|   │     └─Selection_46                 | 9990.00  | cop[tikv] |               | not(isnull(test1.t2.c))                             |
|   │       └─TableFullScan_45           | 10000.00 | cop[tikv] | table:t2      | keep order:false, stats:pseudo                      |
|   └─TableReader_50(Probe)              | 9990.00  | root      |               | data:Selection_49                                   |
|     └─Selection_49                     | 9990.00  | cop[tikv] |               | not(isnull(test1.t3.e))                             |
|       └─TableFullScan_48               | 10000.00 | cop[tikv] | table:t3      | keep order:false, stats:pseudo                      |
+----------------------------------------+----------+-----------+---------------+-----------------------------------------------------+
14 rows in set (0.00 sec)
{noformat}



> Push down topK through join
> ---------------------------
>
>                 Key: SPARK-42513
>                 URL: https://issues.apache.org/jira/browse/SPARK-42513
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Yuming Wang
>            Priority: Major
>         Attachments: after-UI.png, before-UI.png
>
>
> {code:scala}
>     spark.range(100000000).selectExpr("id % 10000 as a", "id as b").write.saveAsTable("t1")
>     spark.range(100000000).selectExpr("id % 10000 as x", "id as y").write.saveAsTable("t2")
>     sql("select * from t1 left join t2 on a = x order by b limit 5").collect()
>     spark.sql("set spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.LimitPushDown")
>     sql("select * from t1 left join t2 on a = x order by b limit 5").collect()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org