You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by srowen <gi...@git.apache.org> on 2018/08/10 02:01:54 UTC

[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r209127319
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -2149,28 +2149,29 @@ class Analyzer(
     
           case p => p transformExpressionsUp {
     
    -        case udf @ ScalaUDF(func, _, inputs, _, _, _, _) =>
    -          val parameterTypes = ScalaReflection.getParameterTypes(func)
    -          assert(parameterTypes.length == inputs.length)
    -
    -          // TODO: skip null handling for not-nullable primitive inputs after we can completely
    -          // trust the `nullable` information.
    -          // (cls, expr) => cls.isPrimitive && expr.nullable
    -          val needsNullCheck = (cls: Class[_], expr: Expression) =>
    -            cls.isPrimitive && !expr.isInstanceOf[KnownNotNull]
    -          val inputsNullCheck = parameterTypes.zip(inputs)
    -            .filter { case (cls, expr) => needsNullCheck(cls, expr) }
    -            .map { case (_, expr) => IsNull(expr) }
    -            .reduceLeftOption[Expression]((e1, e2) => Or(e1, e2))
    -          // Once we add an `If` check above the udf, it is safe to mark those checked inputs
    -          // as not nullable (i.e., wrap them with `KnownNotNull`), because the null-returning
    -          // branch of `If` will be called if any of these checked inputs is null. Thus we can
    -          // prevent this rule from being applied repeatedly.
    -          val newInputs = parameterTypes.zip(inputs).map{ case (cls, expr) =>
    -            if (needsNullCheck(cls, expr)) KnownNotNull(expr) else expr }
    -          inputsNullCheck
    -            .map(If(_, Literal.create(null, udf.dataType), udf.copy(children = newInputs)))
    -            .getOrElse(udf)
    +        case udf@ScalaUDF(func, _, inputs, _, _, _, _, nullableTypes) =>
    +          if (nullableTypes.isEmpty) {
    --- End diff --
    
    This is probably the weak point: unless there is nullability info, don't do anything to the UDF plan, but, that's probably wrong in some cases


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org