You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dilip Biswal (Jira)" <ji...@apache.org> on 2019/09/16 04:54:00 UTC
[jira] [Commented] (SPARK-29092) EXPLAIN FORMATTED does not work well with DPP

    [ https://issues.apache.org/jira/browse/SPARK-29092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930237#comment-16930237 ] 

Dilip Biswal commented on SPARK-29092:
--------------------------------------

I am looking into this.

> EXPLAIN FORMATTED does not work well with DPP
> ---------------------------------------------
>
>                 Key: SPARK-29092
>                 URL: https://issues.apache.org/jira/browse/SPARK-29092
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Xiao Li
>            Priority: Major
>
>  
> {code:java}
> withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true",
>   SQLConf.DYNAMIC_PARTITION_PRUNING_REUSE_BROADCAST.key -> "false") {
>   withTable("df1", "df2") {
>     spark.range(1000)
>       .select(col("id"), col("id").as("k"))
>       .write
>       .partitionBy("k")
>       .format(tableFormat)
>       .mode("overwrite")
>       .saveAsTable("df1")
>     spark.range(100)
>       .select(col("id"), col("id").as("k"))
>       .write
>       .partitionBy("k")
>       .format(tableFormat)
>       .mode("overwrite")
>       .saveAsTable("df2")
>     sql("EXPLAIN FORMATTED SELECT df1.id, df2.k FROM df1 JOIN df2 ON df1.k = df2.k AND df2.id < 2")
>       .show(false)
>     sql("EXPLAIN EXTENDED SELECT df1.id, df2.k FROM df1 JOIN df2 ON df1.k = df2.k AND df2.id < 2")
>       .show(false)
>   }
> }
> {code}
> The output of EXPLAIN EXTENDED is expected.
> {code:java}
> == Physical Plan ==
> *(2) Project [id#2721L, k#2724L]
> +- *(2) BroadcastHashJoin [k#2722L], [k#2724L], Inner, BuildRight
>    :- *(2) ColumnarToRow
>    :  +- FileScan parquet default.df1[id#2721L,k#2722L] Batched: true, DataFilters: [], Format: Parquet, Location: PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache..., PartitionFilters: [isnotnull(k#2722L), dynamicpruningexpression(k#2722L IN subquery2741)], PushedFilters: [], ReadSchema: struct<id:bigint>
>    :        +- Subquery subquery2741, [id=#358]
>    :           +- *(2) HashAggregate(keys=[k#2724L], functions=[], output=[k#2724L#2740L])
>    :              +- Exchange hashpartitioning(k#2724L, 5), true, [id=#354]
>    :                 +- *(1) HashAggregate(keys=[k#2724L], functions=[], output=[k#2724L])
>    :                    +- *(1) Project [k#2724L]
>    :                       +- *(1) Filter (isnotnull(id#2723L) AND (id#2723L < 2))
>    :                          +- *(1) ColumnarToRow
>    :                             +- FileScan parquet default.df2[id#2723L,k#2724L] Batched: true, DataFilters: [isnotnull(id#2723L), (id#2723L < 2)], Format: Parquet, Location: PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache..., PartitionFilters: [isnotnull(k#2724L)], PushedFilters: [IsNotNull(id), LessThan(id,2)], ReadSchema: struct<id:bigint>
>    +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true])), [id=#379]
>       +- *(1) Project [k#2724L]
>          +- *(1) Filter (isnotnull(id#2723L) AND (id#2723L < 2))
>             +- *(1) ColumnarToRow
>                +- FileScan parquet default.df2[id#2723L,k#2724L] Batched: true, DataFilters: [isnotnull(id#2723L), (id#2723L < 2)], Format: Parquet, Location: PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache..., PartitionFilters: [isnotnull(k#2724L)], PushedFilters: [IsNotNull(id), LessThan(id,2)], ReadSchema: struct<id:bigint>
> {code}
> However, the output of FileScan node of EXPLAIN FORMATTED does not show the effect of DPP
> {code:java}
> * Project (9)
> +- * BroadcastHashJoin Inner BuildRight (8)
>    :- * ColumnarToRow (2)
>    :  +- Scan parquet default.df1 (1)
>    +- BroadcastExchange (7)
>       +- * Project (6)
>          +- * Filter (5)
>             +- * ColumnarToRow (4)
>                +- Scan parquet default.df2 (3)
> (1) Scan parquet default.df1 
> Output: [id#2716L, k#2717L]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org