You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Takeshi Yamamuro (JIRA)" <ji...@apache.org> on 2018/01/12 06:03:00 UTC

[jira] [Commented] (SPARK-23021) AnalysisBarrier should not cut off the explain output for Parsed Logical Plan

    [ https://issues.apache.org/jira/browse/SPARK-23021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323599#comment-16323599 ] 

Takeshi Yamamuro commented on SPARK-23021:
------------------------------------------

hi, kris, you're working on it?
I think we just forget to override innerChildren in AnalysisBarrier.
{code}

>> Before
scala> Seq((1, 1)).toDF("a", "b").groupBy("a").count().sample(0.1).explain(true)
== Parsed Logical Plan ==
Sample 0.0, 0.1, false, -7661439431999668039
+- AnalysisBarrier Aggregate [a#5], [a#5, count(1) AS count#14L]

>> After
scala> Seq((1, 1)).toDF("a", "b").groupBy("a").count().sample(0.1).explain(true)
== Parsed Logical Plan ==
Sample 0.0, 0.1, false, -5086223488015741426
+- AnalysisBarrier
      +- Aggregate [a#5], [a#5, count(1) AS count#14L]
         +- Project [_1#2 AS a#5, _2#3 AS b#6]
            +- LocalRelation [_1#2, _2#3]
{code}

> AnalysisBarrier should not cut off the explain output for Parsed Logical Plan
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-23021
>                 URL: https://issues.apache.org/jira/browse/SPARK-23021
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Kris Mok
>
> In PR#20094 as a follow up to SPARK-20392, there were some fixes to the handling of {{AnalysisBarrier}}, but there seem to be more cases that need to be fixed.
> One such case is that right now the Parsed Logical Plan in explain output would be cutoff by {{AnalysisBarrier}}, e.g.
> {code:none}
> scala> val df1 = spark.range(1).select('id as 'x, 'id + 1 as 'y).repartition(1).select('x === 'y)
> df1: org.apache.spark.sql.DataFrame = [(x = y): boolean]
> scala> df1.explain(true)
> == Parsed Logical Plan ==
> 'Project [('x = 'y) AS (x = y)#22]
> +- AnalysisBarrier Repartition 1, true
> == Analyzed Logical Plan ==
> (x = y): boolean
> Project [(x#16L = y#17L) AS (x = y)#22]
> +- Repartition 1, true
>    +- Project [id#13L AS x#16L, (id#13L + cast(1 as bigint)) AS y#17L]
>       +- Range (0, 1, step=1, splits=Some(8))
> == Optimized Logical Plan ==
> Project [(x#16L = y#17L) AS (x = y)#22]
> +- Repartition 1, true
>    +- Project [id#13L AS x#16L, (id#13L + 1) AS y#17L]
>       +- Range (0, 1, step=1, splits=Some(8))
> == Physical Plan ==
> *Project [(x#16L = y#17L) AS (x = y)#22]
> +- Exchange RoundRobinPartitioning(1)
>    +- *Project [id#13L AS x#16L, (id#13L + 1) AS y#17L]
>       +- *Range (0, 1, step=1, splits=8)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org