You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2020/03/03 20:26:00 UTC

[jira] [Resolved] (SPARK-30997) An analysis failure in generators with aggregate functions

     [ https://issues.apache.org/jira/browse/SPARK-30997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun resolved SPARK-30997.
-----------------------------------
    Fix Version/s: 3.0.0
       Resolution: Fixed

Issue resolved by pull request 27749
[https://github.com/apache/spark/pull/27749]

> An analysis failure in generators with aggregate functions
> ----------------------------------------------------------
>
>                 Key: SPARK-30997
>                 URL: https://issues.apache.org/jira/browse/SPARK-30997
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0, 3.1.0
>            Reporter: Takeshi Yamamuro
>            Assignee: Takeshi Yamamuro
>            Priority: Major
>             Fix For: 3.0.0
>
>
> We have supported generators in SQL aggregate expressions by SPARK-28782.
> But, the generator(explode) query with aggregate functions in DataFrame failed as follows;
> {code}
> // SPARK-28782: Generator support in aggregate expressions
> scala> spark.range(3).toDF("id").createOrReplaceTempView("t")
> scala> sql("select explode(array(min(id), max(id))) from t").show()
> +---+
> |col|
> +---+
> |  0|
> |  2|
> +---+
> // A failure case handled in this pr
> scala> spark.range(3).select(explode(array(min($"id"), max($"id")))).show()
> org.apache.spark.sql.AnalysisException:
> The query operator `Generate` contains one or more unsupported
> expression types Aggregate, Window or Generate.
> Invalid expressions: [min(`id`), max(`id`)];;
> Project [col#46L]
> +- Generate explode(array(min(id#42L), max(id#42L))), false, [col#46L]
>    +- Range (0, 3, step=1, splits=Some(4))
>   at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:49)
>   at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:48)
>   at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:129)
> {code}
> The root cause is that `ExtractGenerator` wrongly replaces a project w/ aggregate functions
> before `GlobalAggregates` replaces it with an aggregate as follows;
> {code}
> scala> sql("SET spark.sql.optimizer.planChangeLog.level=warn")
> scala> spark.range(3).select(explode(array(min($"id"), max($"id")))).show()
> 20/03/01 12:51:58 WARN HiveSessionStateBuilder$$anon$1: 
> === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences ===
> !'Project [explode(array(min('id), max('id))) AS List()]   'Project [explode(array(min(id#72L), max(id#72L))) AS List()]
>  +- Range (0, 3, step=1, splits=Some(4))                   +- Range (0, 3, step=1, splits=Some(4))
>            
> 20/03/01 12:51:58 WARN HiveSessionStateBuilder$$anon$1: 
> === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator ===
> !'Project [explode(array(min(id#72L), max(id#72L))) AS List()]   Project [col#76L]
> !+- Range (0, 3, step=1, splits=Some(4))                         +- Generate explode(array(min(id#72L), max(id#72L))), false, [col#76L]
> !                                                                   +- Range (0, 3, step=1, splits=Some(4))
>            
> 20/03/01 12:51:58 WARN HiveSessionStateBuilder$$anon$1: 
> === Result of Batch Resolution ===
> !'Project [explode(array(min('id), max('id))) AS List()]   Project [col#76L]
> !+- Range (0, 3, step=1, splits=Some(4))                   +- Generate explode(array(min(id#72L), max(id#72L))), false, [col#76L]
> !                                                             +- Range (0, 3, step=1, splits=Some(4))
>           
> // the analysis failed here...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org