You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ritika Maheshwari (Jira)" <ji...@apache.org> on 2022/12/02 07:33:00 UTC

[jira] [Commented] (SPARK-41236) The renamed field name cannot be recognized after group filtering

    [ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642332#comment-17642332 ] 

Ritika Maheshwari commented on SPARK-41236:
-------------------------------------------

Running the following query against Spark 3.3.0 code that was downloaded. The error message is improved.

spark-sql> select collect_set(age) as age

         > from test2.ageGroups

         > GROUP BY name

         > having size(age) >1;

Error in query: cannot resolve 'size(age)' due to data type mismatch: argument 1 requires (array or map) type, however, 'spark_catalog.test2.agegroups.age' is of int type.; line 4 pos 7;

'Filter (size('age, true) > 1)

+- Aggregate [name#29], [collect_set(age#30, 0, 0) AS age#27]

   +- SubqueryAlias spark_catalog.test2.agegroups

      +- HiveTableRelation [`test2`.`agegroups`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [eid#28, name#29, age#30], Partition Cols: []]

But this is confusing if it recognizes age as int the following query should not have failed. It fails complaining that age is an array as it it getting bound to the renamed column.

spark-sql> select collect_set(age) as age

         > from test2.ageGroups

         > GROUP BY name

         > Having age >1;

Error in query: cannot resolve '(age > 1)' due to data type mismatch: differing types in '(age > 1)' (array<int> and int).; line 4 pos 7;

'Filter (age#62 > 1)

+- Aggregate [name#64], [collect_set(age#65, 0, 0) AS age#62]

   +- SubqueryAlias spark_catalog.test2.agegroups

      +- HiveTableRelation [`test2`.`agegroups`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [eid#63, name#64, age#65], Partition Cols: []]

 

 

> The renamed field name cannot be recognized after group filtering
> -----------------------------------------------------------------
>
>                 Key: SPARK-41236
>                 URL: https://issues.apache.org/jira/browse/SPARK-41236
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: jingxiong zhong
>            Priority: Major
>
> {code:java}
> select collect_set(age) as age
> from db_table.table1
> group by name
> having size(age) > 1 
> {code}
> a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0
> Is it a bug or a new standard?
> h3. *like this:*
> {code:sql}
> create db1.table1(age int, name string);
> insert into db1.table1 values(1, 'a');
> insert into db1.table1 values(2, 'b');
> insert into db1.table1 values(3, 'c');
> --then run sql like this 
> select collect_set(age) as age from db1.table1 group by name having size(age) > 1 ;
> {code}
> h3. Stack Information
> org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input columns: [age]; line 4 pos 12;
> 'Filter (size('age, true) > 1)
> +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0]
>    +- SubqueryAlias spark_catalog.db1.table1
>       +- HiveTableRelation [`db1`.`table1`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], Partition Cols: []]
> 	at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54)
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179)
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535)
> 	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
> 	at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128)
> 	at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127)
> 	at org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
> 	at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154)
> 	at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153)
> 	at org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
> 	at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181)
> 	at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193)
> 	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
> 	at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193)
> 	at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204)
> 	at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323)
> 	at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214)
> 	at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181)
> 	at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161)
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175)
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263)
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94)
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91)
> 	at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:172)
> 	at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:196)
> 	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
> 	at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:192)
> 	at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:88)
> 	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
> 	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:196)
> 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
> 	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:196)
> 	at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:88)
> 	at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:86)
> 	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:78)
> 	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:98)
> 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
> 	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
> 	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
> 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
> 	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org