You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (JIRA)" <ji...@apache.org> on 2016/09/10 01:29:20 UTC
[jira] [Updated] (SPARK-15453) Improve join planning for bucketed /
sorted tables
[ https://issues.apache.org/jira/browse/SPARK-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan updated SPARK-15453:
--------------------------------
Assignee: Tejas Patil
> Improve join planning for bucketed / sorted tables
> --------------------------------------------------
>
> Key: SPARK-15453
> URL: https://issues.apache.org/jira/browse/SPARK-15453
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Reporter: Tejas Patil
> Assignee: Tejas Patil
> Priority: Minor
> Fix For: 2.1.0
>
>
> Datasource allows creation of bucketed and sorted tables but performing joins on such tables still does not utilize this metadata to produce optimal query plan.
> As below, the `Exchange` and `Sort` can be avoided if the tables are known to be hashed + sorted on relevant columns.
> {noformat}
> == Physical Plan ==
> WholeStageCodegen
> : +- SortMergeJoin [j#20,k#21,i#22], [j#23,k#24,i#25], Inner, None
> : :- INPUT
> : +- INPUT
> :- WholeStageCodegen
> : : +- Sort [j#20 ASC,k#21 ASC,i#22 ASC], false, 0
> : : +- INPUT
> : +- Exchange hashpartitioning(j#20, k#21, i#22, 200), None
> : +- WholeStageCodegen
> : : +- Project [j#20,k#21,i#22]
> : : +- Filter (isnotnull(k#21) && isnotnull(j#20))
> : : +- Scan orc default.table7[j#20,k#21,i#22] Format: ORC, InputPaths: file:/XXXXXXX/table7, PushedFilters: [IsNotNull(k), IsNotNull(j)], ReadSchema: struct<j:int,k:string>
> +- WholeStageCodegen
> : +- Sort [j#23 ASC,k#24 ASC,i#25 ASC], false, 0
> : +- INPUT
> +- Exchange hashpartitioning(j#23, k#24, i#25, 200), None
> +- WholeStageCodegen
> : +- Project [j#23,k#24,i#25]
> : +- Filter (isnotnull(k#24) && isnotnull(j#23))
> : +- Scan orc default.table8[j#23,k#24,i#25] Format: ORC, InputPaths: file:/XXXXXXX/table8, PushedFilters: [IsNotNull(k), IsNotNull(j)], ReadSchema: struct<j:int,k:string>
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org