You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dilip Biswal (Jira)" <ji...@apache.org> on 2019/09/16 04:54:00 UTC
[jira] [Commented] (SPARK-29092) EXPLAIN FORMATTED does not work
well with DPP
[ https://issues.apache.org/jira/browse/SPARK-29092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930237#comment-16930237 ]
Dilip Biswal commented on SPARK-29092:
--------------------------------------
I am looking into this.
> EXPLAIN FORMATTED does not work well with DPP
> ---------------------------------------------
>
> Key: SPARK-29092
> URL: https://issues.apache.org/jira/browse/SPARK-29092
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Xiao Li
> Priority: Major
>
>
> {code:java}
> withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true",
> SQLConf.DYNAMIC_PARTITION_PRUNING_REUSE_BROADCAST.key -> "false") {
> withTable("df1", "df2") {
> spark.range(1000)
> .select(col("id"), col("id").as("k"))
> .write
> .partitionBy("k")
> .format(tableFormat)
> .mode("overwrite")
> .saveAsTable("df1")
> spark.range(100)
> .select(col("id"), col("id").as("k"))
> .write
> .partitionBy("k")
> .format(tableFormat)
> .mode("overwrite")
> .saveAsTable("df2")
> sql("EXPLAIN FORMATTED SELECT df1.id, df2.k FROM df1 JOIN df2 ON df1.k = df2.k AND df2.id < 2")
> .show(false)
> sql("EXPLAIN EXTENDED SELECT df1.id, df2.k FROM df1 JOIN df2 ON df1.k = df2.k AND df2.id < 2")
> .show(false)
> }
> }
> {code}
> The output of EXPLAIN EXTENDED is expected.
> {code:java}
> == Physical Plan ==
> *(2) Project [id#2721L, k#2724L]
> +- *(2) BroadcastHashJoin [k#2722L], [k#2724L], Inner, BuildRight
> :- *(2) ColumnarToRow
> : +- FileScan parquet default.df1[id#2721L,k#2722L] Batched: true, DataFilters: [], Format: Parquet, Location: PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache..., PartitionFilters: [isnotnull(k#2722L), dynamicpruningexpression(k#2722L IN subquery2741)], PushedFilters: [], ReadSchema: struct<id:bigint>
> : +- Subquery subquery2741, [id=#358]
> : +- *(2) HashAggregate(keys=[k#2724L], functions=[], output=[k#2724L#2740L])
> : +- Exchange hashpartitioning(k#2724L, 5), true, [id=#354]
> : +- *(1) HashAggregate(keys=[k#2724L], functions=[], output=[k#2724L])
> : +- *(1) Project [k#2724L]
> : +- *(1) Filter (isnotnull(id#2723L) AND (id#2723L < 2))
> : +- *(1) ColumnarToRow
> : +- FileScan parquet default.df2[id#2723L,k#2724L] Batched: true, DataFilters: [isnotnull(id#2723L), (id#2723L < 2)], Format: Parquet, Location: PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache..., PartitionFilters: [isnotnull(k#2724L)], PushedFilters: [IsNotNull(id), LessThan(id,2)], ReadSchema: struct<id:bigint>
> +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true])), [id=#379]
> +- *(1) Project [k#2724L]
> +- *(1) Filter (isnotnull(id#2723L) AND (id#2723L < 2))
> +- *(1) ColumnarToRow
> +- FileScan parquet default.df2[id#2723L,k#2724L] Batched: true, DataFilters: [isnotnull(id#2723L), (id#2723L < 2)], Format: Parquet, Location: PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache..., PartitionFilters: [isnotnull(k#2724L)], PushedFilters: [IsNotNull(id), LessThan(id,2)], ReadSchema: struct<id:bigint>
> {code}
> However, the output of FileScan node of EXPLAIN FORMATTED does not show the effect of DPP
> {code:java}
> * Project (9)
> +- * BroadcastHashJoin Inner BuildRight (8)
> :- * ColumnarToRow (2)
> : +- Scan parquet default.df1 (1)
> +- BroadcastExchange (7)
> +- * Project (6)
> +- * Filter (5)
> +- * ColumnarToRow (4)
> +- Scan parquet default.df2 (3)
> (1) Scan parquet default.df1
> Output: [id#2716L, k#2717L]
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org