You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/10/27 13:40:05 UTC

[GitHub] [spark] wangyum commented on a change in pull request #28642: [SPARK-31809][SQL] Infer IsNotNull from some special equality join keys

wangyum commented on a change in pull request #28642:
URL: https://github.com/apache/spark/pull/28642#discussion_r737478319



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1215,6 +1215,14 @@ object InferFiltersFromConstraints extends Rule[LogicalPlan]
     }
   }
 
+  // Whether the result of this expression may be null. For example: CAST(strCol AS double)
+  // We will infer an IsNotNull expression for this expression to avoid skew join.
+  private def resultMayBeNull(e: Expression): Boolean = e match {
+    case Cast(child, dataType, _, _) => !Cast.canUpCast(child.dataType, dataType)
+    case _: Coalesce => true
+    case _ => false
+  }

Review comment:
       @cloud-fan @HyukjinKwon It will not infer all equality join keys. For example:
   
   Infer | Will not infer
   -- | --
   cast(strCol AS double) = doubleCol | upper(strCol) = upperStrCol
   cast(bigintCol AS int) = intCol | cast(intCol AS bigint) = bigintCol
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org