You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ritika Maheshwari (Jira)" <ji...@apache.org> on 2022/12/02 04:23:00 UTC

[jira] (SPARK-41236) The renamed field name cannot be recognized after group filtering

    [ https://issues.apache.org/jira/browse/SPARK-41236 ]


    Ritika Maheshwari deleted comment on SPARK-41236:
    -------------------------------------------

was (Author: ritikam):
Hello Zhong ,

Try to rename the field as a different name than the original column name.
select collect_set(age) as ageCol
from db_table.table1
group by name
having size(ageCol) > 1 
 

Although your result will be zero rows. Because you have only one age for each of your names "a","b" and "c"

Therefore size(ageCol) >1 will fail.

But if you have your table as 

age name

1      "a"

2      "a"

3     "a"

4      "b"

5      "c"

6      "c"

 

Then you will get a result

[1,2,3]

[5,6]

 

 

> The renamed field name cannot be recognized after group filtering
> -----------------------------------------------------------------
>
>                 Key: SPARK-41236
>                 URL: https://issues.apache.org/jira/browse/SPARK-41236
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: jingxiong zhong
>            Priority: Major
>
> {code:java}
> select collect_set(age) as age
> from db_table.table1
> group by name
> having size(age) > 1 
> {code}
> a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0
> Is it a bug or a new standard?
> h3. *like this:*
> {code:sql}
> create db1.table1(age int, name string);
> insert into db1.table1 values(1, 'a');
> insert into db1.table1 values(2, 'b');
> insert into db1.table1 values(3, 'c');
> --then run sql like this 
> select collect_set(age) as age from db1.table1 group by name having size(age) > 1 ;
> {code}
> h3. Stack Information
> org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input columns: [age]; line 4 pos 12;
> 'Filter (size('age, true) > 1)
> +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0]
>    +- SubqueryAlias spark_catalog.db1.table1
>       +- HiveTableRelation [`db1`.`table1`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], Partition Cols: []]
> 	at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54)
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179)
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535)
> 	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
> 	at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128)
> 	at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127)
> 	at org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
> 	at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154)
> 	at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153)
> 	at org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
> 	at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181)
> 	at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193)
> 	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
> 	at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193)
> 	at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204)
> 	at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323)
> 	at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214)
> 	at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181)
> 	at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161)
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175)
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263)
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94)
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91)
> 	at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:172)
> 	at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:196)
> 	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
> 	at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:192)
> 	at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:88)
> 	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
> 	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:196)
> 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
> 	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:196)
> 	at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:88)
> 	at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:86)
> 	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:78)
> 	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:98)
> 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
> 	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
> 	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
> 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
> 	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org