You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/06 07:15:02 UTC

[GitHub] [spark] amaliujia opened a new pull request #35404: [SPARK-38118][SQL] MEAN(Boolean) in the HAVING claus should throw data mismatch error

amaliujia opened a new pull request #35404:
URL: https://github.com/apache/spark/pull/35404


   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   ```
   with t as (select true c)
   3select t.c
   4from t
   5group by t.c
   6having mean(t.c) > 0 {code}
   ```
   
   This query throws `Column 't.c' does not exist. Did you mean one of the following? [t.c]`
   
   However, mean(boolean) is not a supported function signature, thus error result should be  `cannot resolve 'mean(t.c)' due to data type mismatch: function average requires numeric or interval types, not boolean`
   
    
   
   This is because
   
   1. The mean(boolean) in HAVING was not marked as resolved in `ResolveFunctions` rule.
   2. Thus in `ResolveAggregationFunctions`, the `TempResolvedColumn` as a wrapper in `mean(TempResolvedColumn(t.c))` cannot be removed (only resolved AGG can remove its’s `TempResolvedColumn`).
   3. Thus in a later batch rule applying,  `TempResolvedColumn` was reverted and it becomes mean(`t.c`), so mean loses the information about t.c.
   4. Thus at the last step, the analyzer can only report t.c not found.
   
    
   
   mean(boolean) in HAVING is not marked as resolved in {{ResolveFunctions}} rule because 
   1. It uses Expression default `resolved` field population code 
   {code:java}
   lazy val resolved: Boolean = childrenResolved && checkInputDataTypes().isSuccess {code}
   2. During the analyzing,  mean(boolean) is mean(TempResolveColumn(boolean), thus childrenResolved is true.
   3. however checkInputDataTypes() will be false ([Average.scala#L55|[https://github.com/apache/spark/blob/74ebef243c18e7a8f32bf90ea75ab6afed9e3132/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala#L55])]
   4. Thus eventually Average's `resolved`  will be false, but it leads to wrong error message.
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   Improve error message so users can better debug their query.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   Yes. This will change user-facing error message.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   
   Unit Test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #35404:
URL: https://github.com/apache/spark/pull/35404#issuecomment-1039694000


   @cloud-fan friendly ping


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r805023507



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -4249,7 +4250,30 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] {
  * rule right after the main resolution batch.
  */
 object RemoveTempResolvedColumn extends Rule[LogicalPlan] {
-  override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveExpressions {
-    case t: TempResolvedColumn => UnresolvedAttribute(t.nameParts)
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+    plan.foreachUp {
+      // HAVING clause will be resolved as a Filter. When having func(column with wrong data type),
+      // the column could be wrapped by a TempResolvedColumn, e.g. mean(tempresolvedcolumn(t.c)).
+      // Because TempResolvedColumn can still preserve column data type, here is a chance to check
+      // if the data type matches with the required data type of the function. We can throw an error
+      // when data types mismatches.
+      case operator: Filter =>

Review comment:
       See https://github.com/apache/spark/pull/35404#discussion_r803997624. 
   
   Unfortunately I don't have enough knowledge why removing `TempResolvedColumn` was introduced (it was introduced in https://github.com/apache/spark/pull/32470)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #35404: [SPARK-38118][SQL] MEAN(Boolean) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #35404:
URL: https://github.com/apache/spark/pull/35404#issuecomment-1030767903


   R: @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r803988721



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -4249,7 +4250,30 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] {
  * rule right after the main resolution batch.
  */
 object RemoveTempResolvedColumn extends Rule[LogicalPlan] {
-  override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveExpressions {
-    case t: TempResolvedColumn => UnresolvedAttribute(t.nameParts)
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+    plan.foreachUp {
+      // Having clause could be resolved as a Filter. When having func(column with wrong data type),
+      // the column could be wrapped by a TempResolvedColumn, e.g. mean(tempresolvedcolumn(t.c)).
+      // Because TempResolvedColumn can still preserve column data type, here is a chance to check
+      // if the data type matches with the required data type of the function. We can throw an error
+      // when data types mismatches.
+      case operator: Filter =>
+        operator.expressions.foreach(_.foreachUp {
+          case e: Expression if e.checkInputDataTypes().isFailure =>

Review comment:
       @cloud-fan 
   
   thank you! `e.childrenResolved` is a handy call and it indeeds solves problem!
   
   I am still checking error at `RemoveTempResolvedColumn`. If you actually prefer to check it at `CheckAnalysis`, let me know and I can make a change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r803188602



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -4249,7 +4250,30 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] {
  * rule right after the main resolution batch.
  */
 object RemoveTempResolvedColumn extends Rule[LogicalPlan] {
-  override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveExpressions {
-    case t: TempResolvedColumn => UnresolvedAttribute(t.nameParts)
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+    plan.foreachUp {
+      // Having clause could be resolved as a Filter. When having func(column with wrong data type),
+      // the column could be wrapped by a TempResolvedColumn, e.g. mean(tempresolvedcolumn(t.c)).
+      // Because TempResolvedColumn can still preserve column data type, here is a chance to check
+      // if the data type matches with the required data type of the function. We can throw an error
+      // when data types mismatches.
+      case operator: Filter =>
+        operator.expressions.foreach(_.foreachUp {
+          case e: Expression if e.checkInputDataTypes().isFailure =>

Review comment:
       @cloud-fan 
   
   I updated this PR based on our discussion.
   
   Right now my implementation could fail a few queries (for example SQLQuerySuite.normalize special floating numbers in subquery). 
   
   ```
   Invalid call to dataType on unresolved object
   org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object
   	at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:137)
   	at org.apache.spark.sql.catalyst.expressions.BinaryOperator.checkInputDataTypes(Expression.scala:713)
   	at org.apache.spark.sql.catalyst.expressions.BinaryComparison.checkInputDataTypes(predicates.scala:908)
   	at org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82(Analyzer.scala:4262)
   
   ```
   
   I believe it is because I called `checkInputDataTypes()` on a binary comparator that has at least one side as `UnresolvedAttribute`. Do you know how to filter such case in my implementation? 
   
   I am not familiar with scala and our codebase. Any help will be much appreciated. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] MEAN(Boolean) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r800938632



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala
##########
@@ -46,6 +46,8 @@ case class Average(
 
   def this(child: Expression) = this(child, failOnError = SQLConf.get.ansiEnabled)
 
+  override lazy val resolved: Boolean = children.forall(_.resolved)

Review comment:
       Do you agree the way in this PR for Average? If so I can go to check existing expressions and make them follow the same way?
   
   Basically if we make one expression as resolved when `all children is solved` && `data type match`, in the case of the TempResolveColumn usage, we could have wrong error message in the later stage (it reports `column not found`, which is not true).
   
   If we change to say such expression can be marked as resolved only `all children is solved`, later in the CheckAnalysis, it will report data mismatch. 
   
   If the proposal change is not the good way to fix this issue, is there a better way?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r805023663



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
##########
@@ -645,4 +645,9 @@ case class TempResolvedColumn(child: Expression, nameParts: Seq[String]) extends
   override def dataType: DataType = child.dataType
   override protected def withNewChildInternal(newChild: Expression): Expression =
     copy(child = newChild)
+
+  override def sql: String = {
+    val childrenSQL = children.map(_.sql).mkString(", ")

Review comment:
       seems just equivalent? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] allisonwang-db commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
allisonwang-db commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r805015914



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -4249,7 +4250,30 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] {
  * rule right after the main resolution batch.
  */
 object RemoveTempResolvedColumn extends Rule[LogicalPlan] {
-  override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveExpressions {
-    case t: TempResolvedColumn => UnresolvedAttribute(t.nameParts)
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+    plan.foreachUp {
+      // HAVING clause will be resolved as a Filter. When having func(column with wrong data type),
+      // the column could be wrapped by a TempResolvedColumn, e.g. mean(tempresolvedcolumn(t.c)).
+      // Because TempResolvedColumn can still preserve column data type, here is a chance to check
+      // if the data type matches with the required data type of the function. We can throw an error
+      // when data types mismatches.
+      case operator: Filter =>

Review comment:
       How about handling `TempResolvedColumn` in `CheckAnalysis`? For example:
   https://github.com/apache/spark/blob/25a4c5fa84d64e37cf5c27c7b2f0f29867330bf2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L161-L166
   Are there new issues if we keep TempResolvedColumn during the analysis?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #35404: [SPARK-38118][SQL] MEAN(Boolean) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #35404:
URL: https://github.com/apache/spark/pull/35404#issuecomment-1030770108


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r803988721



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -4249,7 +4250,30 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] {
  * rule right after the main resolution batch.
  */
 object RemoveTempResolvedColumn extends Rule[LogicalPlan] {
-  override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveExpressions {
-    case t: TempResolvedColumn => UnresolvedAttribute(t.nameParts)
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+    plan.foreachUp {
+      // Having clause could be resolved as a Filter. When having func(column with wrong data type),
+      // the column could be wrapped by a TempResolvedColumn, e.g. mean(tempresolvedcolumn(t.c)).
+      // Because TempResolvedColumn can still preserve column data type, here is a chance to check
+      // if the data type matches with the required data type of the function. We can throw an error
+      // when data types mismatches.
+      case operator: Filter =>
+        operator.expressions.foreach(_.foreachUp {
+          case e: Expression if e.checkInputDataTypes().isFailure =>

Review comment:
       @cloud-fan 
   
   thank you! `e.childrenResolved` is a handy call and it indeed solves problem!
   
   I am still checking error at `RemoveTempResolvedColumn`. If you actually prefer to check it at `CheckAnalysis`, let me know and I can make a change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r804393674



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -4281,6 +4281,44 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         Row(2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22) :: Nil)
     }
   }
+
+  test("SPARK-38118: Func(wrong_type) in the HAVING clause should throw data mismatch error") {
+    val meanErrorMsg = s"cannot resolve 'mean(t.c)' due to data type " +
+      "mismatch: function average requires numeric or interval types, not boolean"
+    val absErrorMsg = s"cannot resolve 'abs(t.c)' due to data type " +
+      "mismatch: argument 1 requires (numeric or interval day to second or interval year to " +
+      "month) type, however, 't.c' is of boolean type."
+
+    Seq("mean", "abs").foreach { func =>
+      val e1 = intercept[AnalysisException](
+        sql(
+          s"""
+             |WITH t as (SELECT true c)
+             |SELECT t.c
+             |FROM t
+             |GROUP BY t.c
+             |HAVING ${func}(t.c) > 0d""".stripMargin))
+      if (func == "mean") {
+        assert(e1.message.contains(meanErrorMsg))

Review comment:
       we don't need to check the full error message
   ```
   assert(e1.message.contains(s"cannot resolve '$func(t.c)' due to data type mismatch"))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r803188602



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -4249,7 +4250,30 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] {
  * rule right after the main resolution batch.
  */
 object RemoveTempResolvedColumn extends Rule[LogicalPlan] {
-  override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveExpressions {
-    case t: TempResolvedColumn => UnresolvedAttribute(t.nameParts)
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+    plan.foreachUp {
+      // Having clause could be resolved as a Filter. When having func(column with wrong data type),
+      // the column could be wrapped by a TempResolvedColumn, e.g. mean(tempresolvedcolumn(t.c)).
+      // Because TempResolvedColumn can still preserve column data type, here is a chance to check
+      // if the data type matches with the required data type of the function. We can throw an error
+      // when data types mismatches.
+      case operator: Filter =>
+        operator.expressions.foreach(_.foreachUp {
+          case e: Expression if e.checkInputDataTypes().isFailure =>

Review comment:
       @cloud-fan 
   
   I updated this PR based on our discussion.
   
   Right now my implementation could fail a few queries (for example SQLQuerySuite.normalize special floating numbers in subquery). 
   
   ```
   Invalid call to dataType on unresolved object
   org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object
   	at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:137)
   	at org.apache.spark.sql.catalyst.expressions.BinaryOperator.checkInputDataTypes(Expression.scala:713)
   	at org.apache.spark.sql.catalyst.expressions.BinaryComparison.checkInputDataTypes(predicates.scala:908)
   	at org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82(Analyzer.scala:4262)
   
   ```
   
   I believe it is because I called `checkInputDataTypes()` on a binary comparator that has at least one side as `UnresolvedAttribute`. Do you know how to filter such case in my implementation? Basically how to filter out an Expression based on its nested children if there is any one is `UnresolvedAttribute`?
   
   I am not familiar with scala and our codebase. Any help will be much appreciated. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #35404:
URL: https://github.com/apache/spark/pull/35404#issuecomment-1042341021


   Conflicts were resolved.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r804318647



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -4281,6 +4281,53 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         Row(2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22) :: Nil)
     }
   }
+
+  test("SPARK-38118: Func(wrong_type) in the HAVING claus should throw data mismatch error") {
+    val e1 = intercept[AnalysisException](
+      sql(
+        """
+          |with t as (select true c)

Review comment:
       nit: can we upper-case the SQL keyword?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r804381525



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -4281,6 +4281,54 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         Row(2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22) :: Nil)
     }
   }
+
+  test("SPARK-38118: Func(wrong_type) in the HAVING clause should throw data mismatch error") {
+    val e1 = intercept[AnalysisException](
+      sql(
+        """
+          |WITH t as (SELECT true c)
+          |SELECT t.c
+          |FROM t
+          |GROUP BY t.c
+          |HAVING mean(t.c) > 0d""".stripMargin))
+    assert(e1.message.contains("cannot resolve 'mean(t.c)' due to data type " +
+      "mismatch: function average requires numeric or interval types, not boolean"))
+
+    val e2 = intercept[AnalysisException](
+      sql(
+        """
+          |WITH t as (SELECT true c, false d)
+          |SELECT (t.c AND t.d) c
+          |FROM t
+          |GROUP BY t.c
+          |HAVING mean(c) > 0d""".stripMargin))
+    assert(e2.message.contains("cannot resolve 'mean(t.c)' due to data type mismatch: function " +
+      "average requires numeric or interval types, not boolean"))
+
+    val e3 = intercept[AnalysisException](
+      sql(
+        """
+          |WITH t as (SELECT true c, false d)
+          |SELECT (t.c AND t.d) c
+          |FROM t
+          |GROUP BY t.c
+          |HAVING abs(c) > 0d""".stripMargin))
+    assert(e3.message.contains("cannot resolve 'abs(t.c)' due to data type " +
+      "mismatch: argument 1 requires (numeric or interval day to second or interval year to " +
+      "month) type, however, 't.c' is of boolean type."))
+
+    val e4 = intercept[AnalysisException](
+      sql(
+        """
+          |WITH t as (SELECT true c)
+          |SELECT t.c
+          |FROM t
+          |GROUP BY t.c
+          |HAVING abs(t.c) > 0d""".stripMargin))

Review comment:
       seems the only difference between this test and the first one is we replace `mean` with `abs`. How about
   ```
   Seq("mean", "abs").foreach {  func =>
     val e1 = intercept...
       ...
       |HAVING $func(t.c) ...
       ...
   
     val e2 = ...
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r804319206



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -4281,6 +4281,53 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         Row(2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22) :: Nil)
     }
   }
+
+  test("SPARK-38118: Func(wrong_type) in the HAVING claus should throw data mismatch error") {
+    val e1 = intercept[AnalysisException](
+      sql(
+        """
+          |with t as (select true c)
+          |select t.c
+          |from t
+          |group by t.c
+          |having mean(t.c) > 0d""".stripMargin))
+    assert(e1.message.contains("cannot resolve 'mean(tempresolvedcolumn(t.c))' due to data type " +
+      "mismatch: function average requires numeric or interval types, not boolean"))
+
+    val e2 = intercept[AnalysisException](
+      sql(
+        """
+          |with t as (select true c, false d)
+          |select (t.c & t.d) c
+          |from t
+          |group by t.c
+          |having mean(c) > 0d""".stripMargin))
+    assert(e2.message.contains("cannot resolve '(t.c & t.d)' due to data type mismatch: " +
+      "'(t.c & t.d)' requires integral type, not boolean"))
+
+    val e3 = intercept[AnalysisException](
+      sql(
+        """
+          |with t as (select true c, false d)
+          |select (t.c & t.d) c
+          |from t
+          |group by t.c
+          |having abs(c) > 0d""".stripMargin))
+    assert(e3.message.contains("cannot resolve '(t.c & t.d)' due to data type mismatch: " +
+      "'(t.c & t.d)' requires integral type, not boolean"))
+
+    val e4 = intercept[AnalysisException](
+      sql(
+        """
+          |with t as (select true c, false d)

Review comment:
       column d is not used.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r804319062



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -4281,6 +4281,53 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         Row(2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22) :: Nil)
     }
   }
+
+  test("SPARK-38118: Func(wrong_type) in the HAVING claus should throw data mismatch error") {
+    val e1 = intercept[AnalysisException](
+      sql(
+        """
+          |with t as (select true c)
+          |select t.c
+          |from t
+          |group by t.c
+          |having mean(t.c) > 0d""".stripMargin))
+    assert(e1.message.contains("cannot resolve 'mean(tempresolvedcolumn(t.c))' due to data type " +
+      "mismatch: function average requires numeric or interval types, not boolean"))
+
+    val e2 = intercept[AnalysisException](
+      sql(
+        """
+          |with t as (select true c, false d)
+          |select (t.c & t.d) c

Review comment:
       I think you want to test `t.c AND t.d`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r804318278



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -4249,7 +4250,30 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] {
  * rule right after the main resolution batch.
  */
 object RemoveTempResolvedColumn extends Rule[LogicalPlan] {
-  override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveExpressions {
-    case t: TempResolvedColumn => UnresolvedAttribute(t.nameParts)
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+    plan.foreachUp {
+      // Having clause could be resolved as a Filter. When having func(column with wrong data type),

Review comment:
       ```suggestion
         // HAVING clause will be resolved as a Filter. When having func(column with wrong data type),
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in HAVING clause should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r809487964



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -4294,6 +4294,31 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         Row(3, 2, 6) :: Nil)
     }
   }
+
+  test("SPARK-38118: Func(wrong_type) in the HAVING clause should throw data mismatch error") {

Review comment:
       I found that I can easily put the existing tests into `AnalysisSuite`. Per my understanding you are suggesting we put test into `catalyst` as this is change in `catalyst`.  So I ended up with **moving** test cases to `AnalysisSuite`. Please let me know if you think we should also have tests in `SQLQuerySuite`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in HAVING clause should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r808744580



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -4294,6 +4294,31 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         Row(3, 2, 6) :: Nil)
     }
   }
+
+  test("SPARK-38118: Func(wrong_type) in the HAVING clause should throw data mismatch error") {
+    Seq("mean", "abs").foreach { func =>
+      val e1 = intercept[AnalysisException](
+        sql(
+          s"""
+             |WITH t as (SELECT true c)
+             |SELECT t.c
+             |FROM t
+             |GROUP BY t.c
+             |HAVING ${func}(t.c) > 0d""".stripMargin))
+

Review comment:
       Please remove redundant empty line here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r803190492



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -4281,6 +4281,53 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         Row(2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22) :: Nil)
     }
   }
+
+  test("SPARK-38118: Func(wrong_type) in the HAVING claus should throw data mismatch error") {

Review comment:
       All these tests case can pass




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r803306323



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -4249,7 +4250,30 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] {
  * rule right after the main resolution batch.
  */
 object RemoveTempResolvedColumn extends Rule[LogicalPlan] {
-  override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveExpressions {
-    case t: TempResolvedColumn => UnresolvedAttribute(t.nameParts)
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+    plan.foreachUp {
+      // Having clause could be resolved as a Filter. When having func(column with wrong data type),
+      // the column could be wrapped by a TempResolvedColumn, e.g. mean(tempresolvedcolumn(t.c)).
+      // Because TempResolvedColumn can still preserve column data type, here is a chance to check
+      // if the data type matches with the required data type of the function. We can throw an error
+      // when data types mismatches.
+      case operator: Filter =>
+        operator.expressions.foreach(_.foreachUp {
+          case e: Expression if e.checkInputDataTypes().isFailure =>

Review comment:
       A straightforward fix is to change it to `case e: Expression if e.childrenResolved && e.checkInputDataTypes().isFailure`.
   
   A new idea is to not strip `TempResolvedColumn`, as `TempResolvedColumn` always means a failure. We can update `CheckAnalysis` to handle it, i.e. adding a new case after `case e: Expression if e.checkInputDataTypes().isFailure =>`
   ```
   case t: TempResolvedColumn =>
     val a = UnresolvedAttribute(t.nameParts)
     the same code that handles "case a: Attribute if !a.resolved"
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r804318808



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -4281,6 +4281,53 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         Row(2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22) :: Nil)
     }
   }
+
+  test("SPARK-38118: Func(wrong_type) in the HAVING claus should throw data mismatch error") {
+    val e1 = intercept[AnalysisException](
+      sql(
+        """
+          |with t as (select true c)
+          |select t.c
+          |from t
+          |group by t.c
+          |having mean(t.c) > 0d""".stripMargin))
+    assert(e1.message.contains("cannot resolve 'mean(tempresolvedcolumn(t.c))' due to data type " +

Review comment:
       shall we strip tempresolvedcolumn before generating the error message?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r804338415



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -4281,6 +4281,53 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         Row(2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22) :: Nil)
     }
   }
+
+  test("SPARK-38118: Func(wrong_type) in the HAVING claus should throw data mismatch error") {
+    val e1 = intercept[AnalysisException](
+      sql(
+        """
+          |with t as (select true c)
+          |select t.c
+          |from t
+          |group by t.c
+          |having mean(t.c) > 0d""".stripMargin))
+    assert(e1.message.contains("cannot resolve 'mean(tempresolvedcolumn(t.c))' due to data type " +

Review comment:
       I override `def sql:string` in `TempResolvedColumn` to strip tempresolvedcolumn in error message. let me know if there is a better way to do so.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r803191792



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -4249,7 +4250,30 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] {
  * rule right after the main resolution batch.
  */
 object RemoveTempResolvedColumn extends Rule[LogicalPlan] {
-  override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveExpressions {
-    case t: TempResolvedColumn => UnresolvedAttribute(t.nameParts)
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+    plan.foreachUp {
+      // Having clause could be resolved as a Filter. When having func(column with wrong data type),
+      // the column could be wrapped by a TempResolvedColumn, e.g. mean(tempresolvedcolumn(t.c)).
+      // Because TempResolvedColumn can still preserve column data type, here is a chance to check
+      // if the data type matches with the required data type of the function. We can throw an error
+      // when data types mismatches.
+      case operator: Filter =>
+        operator.expressions.foreach(_.foreachUp {
+          case e: Expression if e.checkInputDataTypes().isFailure =>

Review comment:
       I tried those functions like `transformWithPruning`, unfortunately didn't make it work.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] MEAN(Boolean) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r800221992



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala
##########
@@ -46,6 +46,8 @@ case class Average(
 
   def this(child: Expression) = this(child, failOnError = SQLConf.get.ansiEnabled)
 
+  override lazy val resolved: Boolean = children.forall(_.resolved)

Review comment:
       Before this change, this `resolved` field of Average is set based on both children all resolved and input data type match (the default Expression implementation). However this will lead to un-resolved column name with Average in HAVING due to the logic of TempResolveColumn, which lead to column not found (but the column was found).
   
   If we set this field only based on if all children are resolved, later in `CheckAnalysis` it will check expression input data type, and throw right `data type mismatch` error: https://github.com/apache/spark/blob/0d56c947f10f747ab4b76426b2d6a34a1d3b8277/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L189




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35404: [SPARK-38118][SQL] MEAN(Boolean) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r800325130



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala
##########
@@ -46,6 +46,8 @@ case class Average(
 
   def this(child: Expression) = this(child, failOnError = SQLConf.get.ansiEnabled)
 
+  override lazy val resolved: Boolean = children.forall(_.resolved)

Review comment:
       It's intentional that `Expression.resolved` also checks the input data types. It seems weird to me that we only change it for `Average`...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in HAVING clause should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #35404:
URL: https://github.com/apache/spark/pull/35404#issuecomment-1043656446


   Merged to master for Apache Spark 3.3.0.
   
   @amaliujia . I added you to the Apache Spark contributor group and assigned SPARK-38118 to you.
   Welcome to the Apache Spark community.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in HAVING clause should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #35404:
URL: https://github.com/apache/spark/pull/35404


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] allisonwang-db commented on a change in pull request #35404: [SPARK-38118][SQL] MEAN(Boolean) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
allisonwang-db commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r800993991



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -4281,6 +4281,19 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         Row(2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22) :: Nil)
     }
   }
+
+  test("SPARK-38118: MEAN(Boolean) in the HAVING claus should throw data mismatch error") {

Review comment:
       nit typo: `claus` -> `clause`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r804911364



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -4281,6 +4281,44 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         Row(2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22) :: Nil)
     }
   }
+
+  test("SPARK-38118: Func(wrong_type) in the HAVING clause should throw data mismatch error") {
+    val meanErrorMsg = s"cannot resolve 'mean(t.c)' due to data type " +
+      "mismatch: function average requires numeric or interval types, not boolean"
+    val absErrorMsg = s"cannot resolve 'abs(t.c)' due to data type " +
+      "mismatch: argument 1 requires (numeric or interval day to second or interval year to " +
+      "month) type, however, 't.c' is of boolean type."
+
+    Seq("mean", "abs").foreach { func =>
+      val e1 = intercept[AnalysisException](
+        sql(
+          s"""
+             |WITH t as (SELECT true c)
+             |SELECT t.c
+             |FROM t
+             |GROUP BY t.c
+             |HAVING ${func}(t.c) > 0d""".stripMargin))
+      if (func == "mean") {
+        assert(e1.message.contains(meanErrorMsg))

Review comment:
       I see. Done!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in HAVING clause should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #35404:
URL: https://github.com/apache/spark/pull/35404#issuecomment-1043665577


   Thank you! @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r803997624



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -4249,7 +4250,30 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] {
  * rule right after the main resolution batch.
  */
 object RemoveTempResolvedColumn extends Rule[LogicalPlan] {
-  override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveExpressions {
-    case t: TempResolvedColumn => UnresolvedAttribute(t.nameParts)
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+    plan.foreachUp {
+      // Having clause could be resolved as a Filter. When having func(column with wrong data type),
+      // the column could be wrapped by a TempResolvedColumn, e.g. mean(tempresolvedcolumn(t.c)).
+      // Because TempResolvedColumn can still preserve column data type, here is a chance to check
+      // if the data type matches with the required data type of the function. We can throw an error
+      // when data types mismatches.
+      case operator: Filter =>
+        operator.expressions.foreach(_.foreachUp {
+          case e: Expression if e.checkInputDataTypes().isFailure =>

Review comment:
       In fact I am not sure the consequence of not striping `TempResolvedColumn`. I would guess there was a reason to add that `RemoveTempResolvedColumn` rule/batch? Otherwise why not deal with `TempResolvedColumn` in `CheckAnalysis` when `TempResolvedColumn` was introduced?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r803190492



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -4281,6 +4281,53 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         Row(2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22) :: Nil)
     }
   }
+
+  test("SPARK-38118: Func(wrong_type) in the HAVING claus should throw data mismatch error") {

Review comment:
       All these tests case can by pass




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r804386202



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -4281,6 +4281,54 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         Row(2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22) :: Nil)
     }
   }
+
+  test("SPARK-38118: Func(wrong_type) in the HAVING clause should throw data mismatch error") {
+    val e1 = intercept[AnalysisException](
+      sql(
+        """
+          |WITH t as (SELECT true c)
+          |SELECT t.c
+          |FROM t
+          |GROUP BY t.c
+          |HAVING mean(t.c) > 0d""".stripMargin))
+    assert(e1.message.contains("cannot resolve 'mean(t.c)' due to data type " +
+      "mismatch: function average requires numeric or interval types, not boolean"))
+
+    val e2 = intercept[AnalysisException](
+      sql(
+        """
+          |WITH t as (SELECT true c, false d)
+          |SELECT (t.c AND t.d) c
+          |FROM t
+          |GROUP BY t.c
+          |HAVING mean(c) > 0d""".stripMargin))
+    assert(e2.message.contains("cannot resolve 'mean(t.c)' due to data type mismatch: function " +
+      "average requires numeric or interval types, not boolean"))
+
+    val e3 = intercept[AnalysisException](
+      sql(
+        """
+          |WITH t as (SELECT true c, false d)
+          |SELECT (t.c AND t.d) c
+          |FROM t
+          |GROUP BY t.c
+          |HAVING abs(c) > 0d""".stripMargin))
+    assert(e3.message.contains("cannot resolve 'abs(t.c)' due to data type " +
+      "mismatch: argument 1 requires (numeric or interval day to second or interval year to " +
+      "month) type, however, 't.c' is of boolean type."))
+
+    val e4 = intercept[AnalysisException](
+      sql(
+        """
+          |WITH t as (SELECT true c)
+          |SELECT t.c
+          |FROM t
+          |GROUP BY t.c
+          |HAVING abs(t.c) > 0d""".stripMargin))

Review comment:
       Good idea! Done!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #35404: [SPARK-38118][SQL] MEAN(Boolean) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35404:
URL: https://github.com/apache/spark/pull/35404#issuecomment-1030993887


   cc @allisonwang-db FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r804381525



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -4281,6 +4281,54 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         Row(2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22) :: Nil)
     }
   }
+
+  test("SPARK-38118: Func(wrong_type) in the HAVING clause should throw data mismatch error") {
+    val e1 = intercept[AnalysisException](
+      sql(
+        """
+          |WITH t as (SELECT true c)
+          |SELECT t.c
+          |FROM t
+          |GROUP BY t.c
+          |HAVING mean(t.c) > 0d""".stripMargin))
+    assert(e1.message.contains("cannot resolve 'mean(t.c)' due to data type " +
+      "mismatch: function average requires numeric or interval types, not boolean"))
+
+    val e2 = intercept[AnalysisException](
+      sql(
+        """
+          |WITH t as (SELECT true c, false d)
+          |SELECT (t.c AND t.d) c
+          |FROM t
+          |GROUP BY t.c
+          |HAVING mean(c) > 0d""".stripMargin))
+    assert(e2.message.contains("cannot resolve 'mean(t.c)' due to data type mismatch: function " +
+      "average requires numeric or interval types, not boolean"))
+
+    val e3 = intercept[AnalysisException](
+      sql(
+        """
+          |WITH t as (SELECT true c, false d)
+          |SELECT (t.c AND t.d) c
+          |FROM t
+          |GROUP BY t.c
+          |HAVING abs(c) > 0d""".stripMargin))
+    assert(e3.message.contains("cannot resolve 'abs(t.c)' due to data type " +
+      "mismatch: argument 1 requires (numeric or interval day to second or interval year to " +
+      "month) type, however, 't.c' is of boolean type."))
+
+    val e4 = intercept[AnalysisException](
+      sql(
+        """
+          |WITH t as (SELECT true c)
+          |SELECT t.c
+          |FROM t
+          |GROUP BY t.c
+          |HAVING abs(t.c) > 0d""".stripMargin))

Review comment:
       seems the only difference between this test and the first one is we replace `mean` with `abs`. How about
   ```
   Seq("mean", "abs").foreach {  func =>
     val e1 = intercept...
       ...
       |HAVING $func(t.c) ...
       ...
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r804339260



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -4281,6 +4281,53 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         Row(2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22) :: Nil)
     }
   }
+
+  test("SPARK-38118: Func(wrong_type) in the HAVING claus should throw data mismatch error") {
+    val e1 = intercept[AnalysisException](
+      sql(
+        """
+          |with t as (select true c)
+          |select t.c
+          |from t
+          |group by t.c
+          |having mean(t.c) > 0d""".stripMargin))
+    assert(e1.message.contains("cannot resolve 'mean(tempresolvedcolumn(t.c))' due to data type " +
+      "mismatch: function average requires numeric or interval types, not boolean"))
+
+    val e2 = intercept[AnalysisException](
+      sql(
+        """
+          |with t as (select true c, false d)
+          |select (t.c & t.d) c
+          |from t
+          |group by t.c
+          |having mean(c) > 0d""".stripMargin))
+    assert(e2.message.contains("cannot resolve '(t.c & t.d)' due to data type mismatch: " +
+      "'(t.c & t.d)' requires integral type, not boolean"))
+
+    val e3 = intercept[AnalysisException](
+      sql(
+        """
+          |with t as (select true c, false d)
+          |select (t.c & t.d) c
+          |from t
+          |group by t.c
+          |having abs(c) > 0d""".stripMargin))
+    assert(e3.message.contains("cannot resolve '(t.c & t.d)' due to data type mismatch: " +
+      "'(t.c & t.d)' requires integral type, not boolean"))
+
+    val e4 = intercept[AnalysisException](
+      sql(
+        """
+          |with t as (select true c, false d)

Review comment:
       nice catch!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in HAVING clause should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r808740696



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
##########
@@ -645,4 +645,8 @@ case class TempResolvedColumn(child: Expression, nameParts: Seq[String]) extends
   override def dataType: DataType = child.dataType
   override protected def withNewChildInternal(newChild: Expression): Expression =
     copy(child = newChild)
+
+  override def sql: String = {
+    s"${child.sql}"
+  }

Review comment:
       This looks a little redundant. Shall we revise like the following?
   ```scala
   override def sql: String = child.sql
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in HAVING clause should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r809486612



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
##########
@@ -645,4 +645,8 @@ case class TempResolvedColumn(child: Expression, nameParts: Seq[String]) extends
   override def dataType: DataType = child.dataType
   override protected def withNewChildInternal(newChild: Expression): Expression =
     copy(child = newChild)
+
+  override def sql: String = {
+    s"${child.sql}"
+  }

Review comment:
       done!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r808480866



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
##########
@@ -645,4 +645,9 @@ case class TempResolvedColumn(child: Expression, nameParts: Seq[String]) extends
   override def dataType: DataType = child.dataType
   override protected def withNewChildInternal(newChild: Expression): Expression =
     copy(child = newChild)
+
+  override def sql: String = {
+    val childrenSQL = children.map(_.sql).mkString(", ")

Review comment:
       done!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in HAVING clause should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r808744374



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -4294,6 +4294,31 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         Row(3, 2, 6) :: Nil)
     }
   }
+
+  test("SPARK-38118: Func(wrong_type) in the HAVING clause should throw data mismatch error") {

Review comment:
       Thank you for adding this. Do you think you can add a test coverage in `catalyst` module too because your main contribution is inside `catalyst` module.
   
   We recommend to keep the patch main code and test code in **the same module closely** in order to maintain the modules more completely by themselves and to test them more easily.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in HAVING clause should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r808842045



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -4294,6 +4294,31 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         Row(3, 2, 6) :: Nil)
     }
   }
+
+  test("SPARK-38118: Func(wrong_type) in the HAVING clause should throw data mismatch error") {

Review comment:
       ideally we should add UT in catalyst as well, such as `AnalysisSuite`. Not sure how hard it is for this case though...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in HAVING clause should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35404:
URL: https://github.com/apache/spark/pull/35404#issuecomment-1043656446


   Merged to master for Apache Spark 3.3.0.
   
   @amaliujia . I added you to the Apache Spark contributor group and assigned SPARK-38118 to you. Welcome!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] allisonwang-db commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
allisonwang-db commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r805016815



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
##########
@@ -645,4 +645,9 @@ case class TempResolvedColumn(child: Expression, nameParts: Seq[String]) extends
   override def dataType: DataType = child.dataType
   override protected def withNewChildInternal(newChild: Expression): Expression =
     copy(child = newChild)
+
+  override def sql: String = {
+    val childrenSQL = children.map(_.sql).mkString(", ")

Review comment:
       `child.sql`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #35404:
URL: https://github.com/apache/spark/pull/35404#issuecomment-1039694000


   @cloud-fan friendly ping


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r807594281



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
##########
@@ -645,4 +645,9 @@ case class TempResolvedColumn(child: Expression, nameParts: Seq[String]) extends
   override def dataType: DataType = child.dataType
   override protected def withNewChildInternal(newChild: Expression): Expression =
     copy(child = newChild)
+
+  override def sql: String = {
+    val childrenSQL = children.map(_.sql).mkString(", ")

Review comment:
       yea equivalent, but `child.sql` is simpler.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org