You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/09 23:56:13 UTC

[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

amaliujia commented on a change in pull request #35404:
URL: https://github.com/apache/spark/pull/35404#discussion_r803188602



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -4249,7 +4250,30 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] {
  * rule right after the main resolution batch.
  */
 object RemoveTempResolvedColumn extends Rule[LogicalPlan] {
-  override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveExpressions {
-    case t: TempResolvedColumn => UnresolvedAttribute(t.nameParts)
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+    plan.foreachUp {
+      // Having clause could be resolved as a Filter. When having func(column with wrong data type),
+      // the column could be wrapped by a TempResolvedColumn, e.g. mean(tempresolvedcolumn(t.c)).
+      // Because TempResolvedColumn can still preserve column data type, here is a chance to check
+      // if the data type matches with the required data type of the function. We can throw an error
+      // when data types mismatches.
+      case operator: Filter =>
+        operator.expressions.foreach(_.foreachUp {
+          case e: Expression if e.checkInputDataTypes().isFailure =>

Review comment:
       @cloud-fan 
   
   I updated this PR based on our discussion.
   
   Right now my implementation could fail a few queries (for example SQLQuerySuite.normalize special floating numbers in subquery). 
   
   ```
   Invalid call to dataType on unresolved object
   org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object
   	at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:137)
   	at org.apache.spark.sql.catalyst.expressions.BinaryOperator.checkInputDataTypes(Expression.scala:713)
   	at org.apache.spark.sql.catalyst.expressions.BinaryComparison.checkInputDataTypes(predicates.scala:908)
   	at org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82(Analyzer.scala:4262)
   
   ```
   
   I believe it is because I called `checkInputDataTypes()` on a binary comparator that has at least one side as `UnresolvedAttribute`. Do you know how to filter such case in my implementation? 
   
   I am not familiar with scala and our codebase. Any help will be much appreciated. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org