You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2017/11/14 22:30:50 UTC

[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/19752

    [SPARK-22520][SQL] Support code generation for large CaseWhen

    ## What changes were proposed in this pull request?
    
    Code generation is disabled for CaseWhen when the number of branches is higher than `spark.sql.codegen.maxCaseBranches` (which defaults to 20). This was done to prevent the well known 64KB method limit exception.
    This PR proposes to support code generation also in those cases (without causing exceptions of course). As a side effect, we could get rid of the `spark.sql.codegen.maxCaseBranches` configuration.
    
    ## How was this patch tested?
    
    existing UTs


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-22520

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19752.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19752
    
----
commit 98eaae9436adf63ec3023ee077f2fff8e23dfa35
Author: Marco Gaido <mg...@hortonworks.com>
Date:   2017-11-14T17:41:00Z

    [SPARK-22520][SQL] Support code generation for large CaseWhen

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84198/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84229/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153079696
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -211,111 +231,62 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState("boolean", ev.isNull, "")
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value, "")
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +      allConditions.mkString("\n")
    +    } else {
    +      ctx.splitExpressions(allConditions, "caseWhen",
    +        ("InternalRow", ctx.INPUT_ROW) :: ("boolean", conditionMet) :: Nil, returnType = "boolean",
    --- End diff --
    
    `ctx.JAVA_BOOLEAN`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153081270
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -211,111 +231,62 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState("boolean", ev.isNull, "")
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value, "")
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +      allConditions.mkString("\n")
    +    } else {
    +      ctx.splitExpressions(allConditions, "caseWhen",
    +        ("InternalRow", ctx.INPUT_ROW) :: ("boolean", conditionMet) :: Nil, returnType = "boolean",
    +        makeSplitFunction = {
    +          func =>
    +            s"""
    +              $func
    +              return $conditionMet;
    +            """
    +        },
    +        foldFunctions = { funcCalls =>
    +          funcCalls.map(funcCall => s"$conditionMet = $funcCall;").mkString("\n")
    --- End diff --
    
    you are right, but we already have checks about it inside the functions. Do we need also to check it outside?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84168/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153079595
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -211,111 +231,62 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState("boolean", ev.isNull, "")
    --- End diff --
    
    `ctx.JAVA_BOOLEAN`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84224/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153118387
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -211,111 +231,73 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    // This variable represents whether the first successful condition is met or not.
    +    // It is initialized to `false` and it is set to `true` when the first condition which
    +    // evaluates to `true` is met and therefore is not needed to go on anymore on the computation
    +    // of the following conditions.
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull, "")
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value, "")
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +        allConditions.mkString("\n")
    +      } else {
    +        ctx.splitExpressions(allConditions, "caseWhen",
    +          ("InternalRow", ctx.INPUT_ROW) :: (ctx.JAVA_BOOLEAN, conditionMet) :: Nil,
    +          returnType = ctx.JAVA_BOOLEAN,
    +          makeSplitFunction = {
    +            func =>
    +              s"""
    +                $func
    +                return $conditionMet;
    +              """
    +          },
    +          foldFunctions = { funcCalls =>
    +            funcCalls.map { funcCall =>
    +              s"""
    +                $conditionMet = $funcCall;
    +                if ($conditionMet) {
    +                  continue;
    +                }"""
    +            }.mkString("do {", "", "\n} while (false);")
    --- End diff --
    
    no, since there is a newline at the beginning of each expression.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    **[Test build #84206 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84206/testReport)** for PR 19752 at commit [`f4c7896`](https://github.com/apache/spark/commit/f4c78965a8cee34e3be8b9d8e264f2d6eb0d27f5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153082196
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -141,14 +141,34 @@ case class If(predicate: Expression, trueValue: Expression, falseValue: Expressi
     }
     
     /**
    - * Abstract parent class for common logic in CaseWhen and CaseWhenCodegen.
    + * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    + * When a = true, returns b; when c = true, returns d; else returns e.
      *
      * @param branches seq of (branch condition, branch value)
      * @param elseValue optional value for the else branch
      */
    -abstract class CaseWhenBase(
    +// scalastyle:off line.size.limit
    +@ExpressionDescription(
    +  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    +  arguments = """
    +    Arguments:
    +      * expr1, expr3 - the branch condition expressions should all be boolean type.
    +      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    +          same type or coercible to a common type.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    +       1
    +      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    +       2
    +      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    --- End diff --
    
    I confirm that Hive returns NULL. Then I am updating the description as requested.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153081226
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -141,14 +141,34 @@ case class If(predicate: Expression, trueValue: Expression, falseValue: Expressi
     }
     
     /**
    - * Abstract parent class for common logic in CaseWhen and CaseWhenCodegen.
    + * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    + * When a = true, returns b; when c = true, returns d; else returns e.
      *
      * @param branches seq of (branch condition, branch value)
      * @param elseValue optional value for the else branch
      */
    -abstract class CaseWhenBase(
    +// scalastyle:off line.size.limit
    +@ExpressionDescription(
    +  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    +  arguments = """
    +    Arguments:
    +      * expr1, expr3 - the branch condition expressions should all be boolean type.
    +      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    +          same type or coercible to a common type.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    +       1
    +      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    +       2
    +      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    --- End diff --
    
    I can follow your first suggestion and I can test this on hive, but actually I haven't changed this part of code. I will post ASAP the result in Hive.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    LGTM cc @cloud-fan @kiszk 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153101065
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -211,111 +231,73 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    --- End diff --
    
    shall we keep this comment and update it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153100103
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -211,111 +231,73 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    // This variable represents whether the first successful condition is met or not.
    +    // It is initialized to `false` and it is set to `true` when the first condition which
    +    // evaluates to `true` is met and therefore is not needed to go on anymore on the computation
    +    // of the following conditions.
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull, "")
    --- End diff --
    
    nit: `ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull)`, as empty string is the default value of the 3rd parameter.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r152865462
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -211,111 +231,61 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState("boolean", ev.isNull, "")
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value, "")
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    -    }
    +        }
    +      """
    +    }.getOrElse("")
     
    -    generatedCode += "}\n" * cases.size
    +    val casesCode = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +      cases.mkString("\n")
    +    } else {
    +      ctx.splitExpressions(cases, "caseWhen",
    --- End diff --
    
    I can reuse the same UT added in #19797 for the 64KB limit, if it is ok for you.
    
    As far as the performance is regarded, I'd need to create a test with many rows, otherwise the overhead is higher than the execution time and such a case is not going to finish in few seconds. Do you want me to post here in the PR some code and the times of execution before and after the PR, without adding it as a test?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    **[Test build #83866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83866/testReport)** for PR 19752 at commit [`98eaae9`](https://github.com/apache/spark/commit/98eaae9436adf63ec3023ee077f2fff8e23dfa35).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    **[Test build #84212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84212/testReport)** for PR 19752 at commit [`5adb513`](https://github.com/apache/spark/commit/5adb513ddba69e02b83aa47b69f45c73753a8457).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r152537080
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -211,111 +231,61 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState("boolean", ev.isNull, "")
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value, "")
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    -    }
    +        }
    +      """
    +    }.getOrElse("")
     
    -    generatedCode += "}\n" * cases.size
    +    val casesCode = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +      cases.mkString("\n")
    +    } else {
    +      ctx.splitExpressions(cases, "caseWhen",
    --- End diff --
    
    I think that we need to call it, indeed, as explained in this comment: https://github.com/apache/spark/pull/19767#issuecomment-345176286


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    **[Test build #84206 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84206/testReport)** for PR 19752 at commit [`f4c7896`](https://github.com/apache/spark/commit/f4c78965a8cee34e3be8b9d8e264f2d6eb0d27f5).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    Sure, it may have some overlaps with #18641. I will review this after #18641 due to avoiding a conflict.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153079959
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -141,14 +141,34 @@ case class If(predicate: Expression, trueValue: Expression, falseValue: Expressi
     }
     
     /**
    - * Abstract parent class for common logic in CaseWhen and CaseWhenCodegen.
    + * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    + * When a = true, returns b; when c = true, returns d; else returns e.
      *
      * @param branches seq of (branch condition, branch value)
      * @param elseValue optional value for the else branch
      */
    -abstract class CaseWhenBase(
    +// scalastyle:off line.size.limit
    +@ExpressionDescription(
    +  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    +  arguments = """
    +    Arguments:
    +      * expr1, expr3 - the branch condition expressions should all be boolean type.
    +      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    +          same type or coercible to a common type.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    +       1
    +      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    +       2
    +      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    --- End diff --
    
    Could you double check Hive returns NULL in the following case?
    > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 END;



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153239636
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -158,111 +178,86 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    // This variable represents whether the first successful condition is met or not.
    +    // It is initialized to `false` and it is set to `true` when the first condition which
    +    // evaluates to `true` is met and therefore is not needed to go on anymore on the computation
    +    // of the following conditions.
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull)
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value)
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +        allConditions.mkString("\n")
    +      } else {
    +        // This generates code like:
    +        // do {
    +        //   conditionMet = caseWhen_1(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   conditionMet = caseWhen_2(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   ...
    +        // } while (false);
    --- End diff --
    
    I'll try to explain with an example. If we have a lot of methods, this can exceed 64KB:
    ```
    do {
      condition = caseWhen_1(i);
      if (condition) {
        continue;
      }
      condition = caseWhen_2(i);
      if (condition) {
        continue;
      }
      // a lot of other methods here
    }
    ```
    Thus, #19480 can split them into:
    ```
    do {
      condition = bunchOf_caseWhen_1(i);
      if (condition) {
        continue;
      }
      condition = bunchOf_caseWhen_2(i);
      if (condition) {
        continue;
      }
      // maybe some other here
    } while (false);
    ...
    InnerClass1 {
      boolean bunchOf_caseWhen_1(InternalRow i) {
       boolean condition = false;
        do {
          condition = caseWhen_1(i);
          if (condition) {
            continue;
          }
          condition = caseWhen_2(i);
          if (condition) {
            continue;
          }
          // a lot of other methods here
        } while (false);
        return condition;
      }
      ...
    }
    ```
    in this case, the implementation you are suggesting can return a wrong value in `bunchOf_caseWhen_1` and this will affect the correctness of the code.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83866/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153123184
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -158,111 +178,73 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    // This variable represents whether the first successful condition is met or not.
    +    // It is initialized to `false` and it is set to `true` when the first condition which
    +    // evaluates to `true` is met and therefore is not needed to go on anymore on the computation
    +    // of the following conditions.
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull)
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value)
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +        allConditions.mkString("\n")
    +      } else {
    +        ctx.splitExpressions(allConditions, "caseWhen",
    +          ("InternalRow", ctx.INPUT_ROW) :: (ctx.JAVA_BOOLEAN, conditionMet) :: Nil,
    --- End diff --
    
    Do we need to pass `conditionMet` as an argument? I think `conditionMet` is always `false` when a function is called.
    If true, we can declare `conditionMet` as a local variable.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    **[Test build #84199 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84199/testReport)** for PR 19752 at commit [`9063583`](https://github.com/apache/spark/commit/9063583bce77348b9da61abec6e9fb5ae7aef117).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    **[Test build #84229 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84229/testReport)** for PR 19752 at commit [`dd5f455`](https://github.com/apache/spark/commit/dd5f455541babc3b594d071f1aae8591cd01f8de).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    **[Test build #84198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84198/testReport)** for PR 19752 at commit [`f9c20be`](https://github.com/apache/spark/commit/f9c20bea19e1e03394a976c90012fc8744267065).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19752


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    **[Test build #84224 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84224/testReport)** for PR 19752 at commit [`c7347b1`](https://github.com/apache/spark/commit/c7347b1565ed2697c0327218bd2c61cff59a8f33).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153242781
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -158,111 +178,86 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    // This variable represents whether the first successful condition is met or not.
    +    // It is initialized to `false` and it is set to `true` when the first condition which
    +    // evaluates to `true` is met and therefore is not needed to go on anymore on the computation
    +    // of the following conditions.
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull)
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value)
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +        allConditions.mkString("\n")
    +      } else {
    +        // This generates code like:
    +        // do {
    +        //   conditionMet = caseWhen_1(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   conditionMet = caseWhen_2(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   ...
    +        // } while (false);
    --- End diff --
    
    this would require a refactoring of the `splitExpressions` method which is really not worth IMHO. I will leave this as it is now, while I am addressing your other comment, thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153081719
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -211,111 +231,62 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState("boolean", ev.isNull, "")
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value, "")
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +      allConditions.mkString("\n")
    +    } else {
    +      ctx.splitExpressions(allConditions, "caseWhen",
    +        ("InternalRow", ctx.INPUT_ROW) :: ("boolean", conditionMet) :: Nil, returnType = "boolean",
    +        makeSplitFunction = {
    +          func =>
    +            s"""
    +              $func
    +              return $conditionMet;
    +            """
    +        },
    +        foldFunctions = { funcCalls =>
    +          funcCalls.map(funcCall => s"$conditionMet = $funcCall;").mkString("\n")
    --- End diff --
    
    Ok, I'll do. Then I'd suggest to do the same also in other places. I can check where an analogous pattern is used and create a PR if it is ok.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153226714
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -158,111 +178,86 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    // This variable represents whether the first successful condition is met or not.
    +    // It is initialized to `false` and it is set to `true` when the first condition which
    +    // evaluates to `true` is met and therefore is not needed to go on anymore on the computation
    +    // of the following conditions.
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull)
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value)
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +        allConditions.mkString("\n")
    +      } else {
    +        // This generates code like:
    +        // do {
    +        //   conditionMet = caseWhen_1(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   conditionMet = caseWhen_2(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   ...
    +        // } while (false);
    --- End diff --
    
    I don't get it, are you saying
    ```
    do {
      if (caseWhen_1(i)) {
        continue;
      }
      if (caseWhen_2(i)) {
        continue;
      }
    }
    ```
    is worse than
    ```
    do {
      condition = caseWhen_1(i);
      if (condition) {
        continue;
      }
      condition = caseWhen_2(i);
      if (condition) {
        continue;
      }
    }
    ```
    ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153181270
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -158,111 +178,86 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    // This variable represents whether the first successful condition is met or not.
    +    // It is initialized to `false` and it is set to `true` when the first condition which
    +    // evaluates to `true` is met and therefore is not needed to go on anymore on the computation
    +    // of the following conditions.
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull)
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value)
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +        allConditions.mkString("\n")
    +      } else {
    +        // This generates code like:
    +        // do {
    +        //   conditionMet = caseWhen_1(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   conditionMet = caseWhen_2(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   ...
    +        // } while (false);
    +        ctx.splitExpressions(allConditions, "caseWhen",
    +          ("InternalRow", ctx.INPUT_ROW) :: Nil,
    +          returnType = ctx.JAVA_BOOLEAN,
    +          makeSplitFunction = {
    +            func =>
    +              s"""
    +                ${ctx.JAVA_BOOLEAN} $conditionMet = false;
    +                $func
    +                return $conditionMet;
    --- End diff --
    
    I think this would complicate the code and I don't think it is worth, since if the code is not split, it means that we don't have many conditions, thus we would save only few `if (conditionMet)` evaluations. What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r152482401
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -211,111 +231,61 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState("boolean", ev.isNull, "")
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value, "")
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    -    }
    +        }
    +      """
    +    }.getOrElse("")
     
    -    generatedCode += "}\n" * cases.size
    +    val casesCode = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +      cases.mkString("\n")
    +    } else {
    +      ctx.splitExpressions(cases, "caseWhen",
    --- End diff --
    
    In almost all the cases, we do not need to call `splitExpressions` after merging the PR https://github.com/apache/spark/pull/19767?
    
    WDYT?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153236140
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -158,111 +178,86 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    // This variable represents whether the first successful condition is met or not.
    +    // It is initialized to `false` and it is set to `true` when the first condition which
    +    // evaluates to `true` is met and therefore is not needed to go on anymore on the computation
    +    // of the following conditions.
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull)
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value)
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +        allConditions.mkString("\n")
    +      } else {
    +        // This generates code like:
    +        // do {
    +        //   conditionMet = caseWhen_1(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   conditionMet = caseWhen_2(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   ...
    +        // } while (false);
    +        ctx.splitExpressions(allConditions, "caseWhen",
    +          ("InternalRow", ctx.INPUT_ROW) :: Nil,
    +          returnType = ctx.JAVA_BOOLEAN,
    +          makeSplitFunction = {
    +            func =>
    +              s"""
    +                ${ctx.JAVA_BOOLEAN} $conditionMet = false;
    +                $func
    +                return $conditionMet;
    --- End diff --
    
    yes, but having this optimization outside means skipping whole methods. Anyway, if you think that this optimization is needed I can do it. I think only that the code readability would be a bit worse but I'll try to address this problem with comments.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153241209
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -158,111 +178,86 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    // This variable represents whether the first successful condition is met or not.
    +    // It is initialized to `false` and it is set to `true` when the first condition which
    +    // evaluates to `true` is met and therefore is not needed to go on anymore on the computation
    +    // of the following conditions.
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull)
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value)
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +        allConditions.mkString("\n")
    +      } else {
    +        // This generates code like:
    +        // do {
    +        //   conditionMet = caseWhen_1(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   conditionMet = caseWhen_2(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   ...
    +        // } while (false);
    --- End diff --
    
    ah i see, we have to set the `condition` if the code is inside a method.
    ```
    do {
      if (bunchOf_caseWhen_1(i)) {
        continue;
      }
      if (bunchOf_caseWhen_2(i)) {
        continue;
      }
      // maybe some other here
    } while (false);
    ...
    InnerClass1 {
      boolean bunchOf_caseWhen_1(InternalRow i) {
        do {
          if (caseWhen_1(i)) {
            return true;
          }
          if (caseWhen_2(i)) {
            continue;
          }
          // a lot of other methods here
        } while (false);
        return false;
      }
      ...
    }
    ```
    This would be the optimal but it's too much complexity with only a little gain.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153183113
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -158,111 +178,86 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    // This variable represents whether the first successful condition is met or not.
    +    // It is initialized to `false` and it is set to `true` when the first condition which
    +    // evaluates to `true` is met and therefore is not needed to go on anymore on the computation
    +    // of the following conditions.
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull)
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value)
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +        allConditions.mkString("\n")
    +      } else {
    +        // This generates code like:
    +        // do {
    +        //   conditionMet = caseWhen_1(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   conditionMet = caseWhen_2(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   ...
    +        // } while (false);
    --- End diff --
    
    No, because in this way we would not set `conditionMet`. This can cause an error, in the case we have so many functions that their invocation goes beyond the 64KB limit: this problem is fixed by #19480, by creating a method which calls the methods of that class, In this case, if we don't set properly `conditionMet`, we would have a bug in the implementation. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153079586
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -211,111 +231,62 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    --- End diff --
    
    Add a comment to explain what it is.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    @kiszk may I kindly ask you to review this please? Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153081547
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -211,111 +231,62 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState("boolean", ev.isNull, "")
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value, "")
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +      allConditions.mkString("\n")
    +    } else {
    +      ctx.splitExpressions(allConditions, "caseWhen",
    +        ("InternalRow", ctx.INPUT_ROW) :: ("boolean", conditionMet) :: Nil, returnType = "boolean",
    +        makeSplitFunction = {
    +          func =>
    +            s"""
    +              $func
    +              return $conditionMet;
    +            """
    +        },
    +        foldFunctions = { funcCalls =>
    +          funcCalls.map(funcCall => s"$conditionMet = $funcCall;").mkString("\n")
    --- End diff --
    
    We want to avoid the extra function calls here. It is not cheap when the number of rows is large. Now, we split the functions pretty aggressively. I saw many new functions are generated.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    **[Test build #84199 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84199/testReport)** for PR 19752 at commit [`9063583`](https://github.com/apache/spark/commit/9063583bce77348b9da61abec6e9fb5ae7aef117).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153118605
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -211,111 +231,73 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    // This variable represents whether the first successful condition is met or not.
    +    // It is initialized to `false` and it is set to `true` when the first condition which
    +    // evaluates to `true` is met and therefore is not needed to go on anymore on the computation
    +    // of the following conditions.
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull, "")
    --- End diff --
    
    thanks, I branched from a version when there was no default value. I merged and fixed it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153237434
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -158,111 +178,86 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    // This variable represents whether the first successful condition is met or not.
    +    // It is initialized to `false` and it is set to `true` when the first condition which
    +    // evaluates to `true` is met and therefore is not needed to go on anymore on the computation
    +    // of the following conditions.
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull)
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value)
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +        allConditions.mkString("\n")
    +      } else {
    +        // This generates code like:
    +        // do {
    +        //   conditionMet = caseWhen_1(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   conditionMet = caseWhen_2(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   ...
    +        // } while (false);
    --- End diff --
    
    I am saying that the first one is wrong since it doesn't set `condition` (which can be returned at the end of the method as of #19480), since it may return wrong results.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    **[Test build #84198 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84198/testReport)** for PR 19752 at commit [`f9c20be`](https://github.com/apache/spark/commit/f9c20bea19e1e03394a976c90012fc8744267065).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    **[Test build #84229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84229/testReport)** for PR 19752 at commit [`dd5f455`](https://github.com/apache/spark/commit/dd5f455541babc3b594d071f1aae8591cd01f8de).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153225638
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -158,111 +178,86 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    // This variable represents whether the first successful condition is met or not.
    +    // It is initialized to `false` and it is set to `true` when the first condition which
    +    // evaluates to `true` is met and therefore is not needed to go on anymore on the computation
    +    // of the following conditions.
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull)
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value)
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +        allConditions.mkString("\n")
    +      } else {
    +        // This generates code like:
    +        // do {
    +        //   conditionMet = caseWhen_1(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   conditionMet = caseWhen_2(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   ...
    +        // } while (false);
    +        ctx.splitExpressions(allConditions, "caseWhen",
    +          ("InternalRow", ctx.INPUT_ROW) :: Nil,
    +          returnType = ctx.JAVA_BOOLEAN,
    +          makeSplitFunction = {
    +            func =>
    +              s"""
    +                ${ctx.JAVA_BOOLEAN} $conditionMet = false;
    +                $func
    +                return $conditionMet;
    --- End diff --
    
    I think in most cases we just split the codes into a few methods, which means, it's more important to apply the `do while` optimization inside the method(a method may have a lot of conditions checks), than between the methods.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    LGTM except a few minor comments


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153100483
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -211,111 +231,73 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    // This variable represents whether the first successful condition is met or not.
    +    // It is initialized to `false` and it is set to `true` when the first condition which
    +    // evaluates to `true` is met and therefore is not needed to go on anymore on the computation
    +    // of the following conditions.
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull, "")
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value, "")
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +        allConditions.mkString("\n")
    +      } else {
    +        ctx.splitExpressions(allConditions, "caseWhen",
    +          ("InternalRow", ctx.INPUT_ROW) :: (ctx.JAVA_BOOLEAN, conditionMet) :: Nil,
    +          returnType = ctx.JAVA_BOOLEAN,
    +          makeSplitFunction = {
    +            func =>
    +              s"""
    +                $func
    +                return $conditionMet;
    +              """
    +          },
    +          foldFunctions = { funcCalls =>
    +            funcCalls.map { funcCall =>
    +              s"""
    +                $conditionMet = $funcCall;
    +                if ($conditionMet) {
    +                  continue;
    +                }"""
    +            }.mkString("do {", "", "\n} while (false);")
    --- End diff --
    
    nit: do we need a `\n` after `do {`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153079772
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -211,111 +231,62 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState("boolean", ev.isNull, "")
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value, "")
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +      allConditions.mkString("\n")
    +    } else {
    +      ctx.splitExpressions(allConditions, "caseWhen",
    --- End diff --
    
    Style issue. Indent


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153079554
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -211,111 +231,62 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState("boolean", ev.isNull, "")
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value, "")
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +      allConditions.mkString("\n")
    +    } else {
    +      ctx.splitExpressions(allConditions, "caseWhen",
    +        ("InternalRow", ctx.INPUT_ROW) :: ("boolean", conditionMet) :: Nil, returnType = "boolean",
    +        makeSplitFunction = {
    +          func =>
    +            s"""
    +              $func
    +              return $conditionMet;
    +            """
    +        },
    +        foldFunctions = { funcCalls =>
    +          funcCalls.map(funcCall => s"$conditionMet = $funcCall;").mkString("\n")
    --- End diff --
    
    When `caseWhenConditionMet` is false, we do not need to call `funcCall `.  


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153118326
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -211,111 +231,73 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    --- End diff --
    
    I don't think it is necessary since now the generated code is way easier and more standard and nowhere else a comment like this is provided. Anyway, if you feel it is needed, I can add it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    **[Test build #83866 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83866/testReport)** for PR 19752 at commit [`98eaae9`](https://github.com/apache/spark/commit/98eaae9436adf63ec3023ee077f2fff8e23dfa35).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class CaseWhen(`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84206/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    @gatorsmile I added a test case to check that the execution plan is `WholeStageCodegenExec` as expected. I also made some performance test using almost the same code, ie.:
    ```
    val N = 30
    val nRows = 1000000
    var expr1 = when($"id" === lit(0), 0)
    var expr2 = when($"id" === lit(0), 10)
    (1 to N).foreach { i =>
      expr1 = expr1.when($"id" === lit(i), -i)
      expr2 = expr2.when($"id" === lit(i + 10), i)
    }
    time { spark.range(nRows).select(expr1.as("c1"), expr2.otherwise(0).as("c2")).sort("c1").show }
    ```
    before this PR, it takes on average 1091.690996ms. After the PR, it takes on average 106.894443ns.
    
    Actually there is a problem which is fixed in #18641 and it is not fixed here, ie. when the code contains deeply nested exceptions, the 64KB limit exception can still happen. But this should be handled in a more generic way in #19813.
    
    @kiszk What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84212/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    **[Test build #84212 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84212/testReport)** for PR 19752 at commit [`5adb513`](https://github.com/apache/spark/commit/5adb513ddba69e02b83aa47b69f45c73753a8457).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153177401
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -158,111 +178,86 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    // This variable represents whether the first successful condition is met or not.
    +    // It is initialized to `false` and it is set to `true` when the first condition which
    +    // evaluates to `true` is met and therefore is not needed to go on anymore on the computation
    +    // of the following conditions.
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull)
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value)
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +        allConditions.mkString("\n")
    +      } else {
    +        // This generates code like:
    +        // do {
    +        //   conditionMet = caseWhen_1(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   conditionMet = caseWhen_2(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   ...
    +        // } while (false);
    --- End diff --
    
    Can we simplified to
    ```
    do {
      if (caseWhen_1(i)) {
        continue;
      }
      if (caseWhen_2(i)) {
        continue;
      }
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153177002
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -158,111 +178,86 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    // This variable represents whether the first successful condition is met or not.
    +    // It is initialized to `false` and it is set to `true` when the first condition which
    +    // evaluates to `true` is met and therefore is not needed to go on anymore on the computation
    +    // of the following conditions.
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull)
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value)
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +        allConditions.mkString("\n")
    +      } else {
    +        // This generates code like:
    +        // do {
    +        //   conditionMet = caseWhen_1(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   conditionMet = caseWhen_2(i);
    +        //   if(conditionMet) {
    +        //     continue;
    +        //   }
    +        //   ...
    +        // } while (false);
    +        ctx.splitExpressions(allConditions, "caseWhen",
    +          ("InternalRow", ctx.INPUT_ROW) :: Nil,
    +          returnType = ctx.JAVA_BOOLEAN,
    +          makeSplitFunction = {
    +            func =>
    +              s"""
    +                ${ctx.JAVA_BOOLEAN} $conditionMet = false;
    +                $func
    +                return $conditionMet;
    --- End diff --
    
    shall we apply the same `do while` optimization here? i.e.
    ```
    do {
      conditionMet = caseWhen_1(i);
      if(conditionMet) {
        continue;
      }
    } while (false)
    
    boolean caseWhen_1(IntenralRow i) {
      do {
        if (!conditionMet) {
          code...
          set value and isnull...
          $conditionMet = true;
          continue;
        }
      }
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    @kiszk actually checking your PR I think that the same issue addressed there would be handled also here by default. What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153080028
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2126,4 +2126,17 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
         val mean = result.select("DecimalCol").where($"summary" === "mean")
         assert(mean.collect().toSet === Set(Row("0.0345678900000000000000000000000000000")))
       }
    +
    +  test("SPARK-22520: support code generation for large CaseWhen") {
    +    val N = 30
    +    var expr1 = when($"id" === lit(0), 0)
    +    var expr2 = when($"id" === lit(0), 10)
    +    (1 to N).foreach { i =>
    +      expr1 = expr1.when($"id" === lit(i), -i)
    +      expr2 = expr2.when($"id" === lit(i + 10), i)
    +    }
    +    val df = spark.range(1).select(expr1, expr2.otherwise(0))
    +    df.show
    --- End diff --
    
    compare the results?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153079931
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -141,14 +141,34 @@ case class If(predicate: Expression, trueValue: Expression, falseValue: Expressi
     }
     
     /**
    - * Abstract parent class for common logic in CaseWhen and CaseWhenCodegen.
    + * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    + * When a = true, returns b; when c = true, returns d; else returns e.
      *
      * @param branches seq of (branch condition, branch value)
      * @param elseValue optional value for the else branch
      */
    -abstract class CaseWhenBase(
    +// scalastyle:off line.size.limit
    +@ExpressionDescription(
    +  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    +  arguments = """
    +    Arguments:
    +      * expr1, expr3 - the branch condition expressions should all be boolean type.
    +      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    +          same type or coercible to a common type.
    +  """,
    +  examples = """
    +    Examples:
    +      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    +       1
    +      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    +       2
    +      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    --- End diff --
    
    > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    ->
    > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 END;


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r152643842
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -211,111 +231,61 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState("boolean", ev.isNull, "")
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value, "")
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    -    }
    +        }
    +      """
    +    }.getOrElse("")
     
    -    generatedCode += "}\n" * cases.size
    +    val casesCode = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +      cases.mkString("\n")
    +    } else {
    +      ctx.splitExpressions(cases, "caseWhen",
    --- End diff --
    
    Then, could you show us a test case? Can be a performance test if the function is hard to hit a 64KB limit. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r152538038
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -211,111 +231,61 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState("boolean", ev.isNull, "")
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value, "")
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    -    }
    +        }
    +      """
    +    }.getOrElse("")
     
    -    generatedCode += "}\n" * cases.size
    +    val casesCode = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +      cases.mkString("\n")
    +    } else {
    +      ctx.splitExpressions(cases, "caseWhen",
    --- End diff --
    
    But I think that this implicitly covers also #18641, even though its main goal is another.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    **[Test build #84224 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84224/testReport)** for PR 19752 at commit [`c7347b1`](https://github.com/apache/spark/commit/c7347b1565ed2697c0327218bd2c61cff59a8f33).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    **[Test build #84168 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84168/testReport)** for PR 19752 at commit [`6225c8e`](https://github.com/apache/spark/commit/6225c8ecb00bab4fa892e7847f5aa9bdee54409b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84199/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19752#discussion_r153124078
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala ---
    @@ -158,111 +178,73 @@ abstract class CaseWhenBase(
         val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
         "CASE" + cases + elseCase + " END"
       }
    -}
    -
    -
    -/**
    - * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
    - * When a = true, returns b; when c = true, returns d; else returns e.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -// scalastyle:off line.size.limit
    -@ExpressionDescription(
    -  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, returns `expr4`; else returns `expr5`.",
    -  arguments = """
    -    Arguments:
    -      * expr1, expr3 - the branch condition expressions should all be boolean type.
    -      * expr2, expr4, expr5 - the branch value expressions and else value expression should all be
    -          same type or coercible to a common type.
    -  """,
    -  examples = """
    -    Examples:
    -      > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       1
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
    -       2
    -      > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
    -       NULL
    -  """)
    -// scalastyle:on line.size.limit
    -case class CaseWhen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable {
    -
    -  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    super[CodegenFallback].doGenCode(ctx, ev)
    -  }
    -
    -  def toCodegen(): CaseWhenCodegen = {
    -    CaseWhenCodegen(branches, elseValue)
    -  }
    -}
    -
    -/**
    - * CaseWhen expression used when code generation condition is satisfied.
    - * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
    - *
    - * @param branches seq of (branch condition, branch value)
    - * @param elseValue optional value for the else branch
    - */
    -case class CaseWhenCodegen(
    -    val branches: Seq[(Expression, Expression)],
    -    val elseValue: Option[Expression] = None)
    -  extends CaseWhenBase(branches, elseValue) with Serializable {
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    -    // Generate code that looks like:
    -    //
    -    // condA = ...
    -    // if (condA) {
    -    //   valueA
    -    // } else {
    -    //   condB = ...
    -    //   if (condB) {
    -    //     valueB
    -    //   } else {
    -    //     condC = ...
    -    //     if (condC) {
    -    //       valueC
    -    //     } else {
    -    //       elseValue
    -    //     }
    -    //   }
    -    // }
    +    // This variable represents whether the first successful condition is met or not.
    +    // It is initialized to `false` and it is set to `true` when the first condition which
    +    // evaluates to `true` is met and therefore is not needed to go on anymore on the computation
    +    // of the following conditions.
    +    val conditionMet = ctx.freshName("caseWhenConditionMet")
    +    ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull)
    +    ctx.addMutableState(ctx.javaType(dataType), ev.value)
         val cases = branches.map { case (condExpr, valueExpr) =>
           val cond = condExpr.genCode(ctx)
           val res = valueExpr.genCode(ctx)
           s"""
    -        ${cond.code}
    -        if (!${cond.isNull} && ${cond.value}) {
    -          ${res.code}
    -          ${ev.isNull} = ${res.isNull};
    -          ${ev.value} = ${res.value};
    +        if(!$conditionMet) {
    +          ${cond.code}
    +          if (!${cond.isNull} && ${cond.value}) {
    +            ${res.code}
    +            ${ev.isNull} = ${res.isNull};
    +            ${ev.value} = ${res.value};
    +            $conditionMet = true;
    +          }
             }
           """
         }
     
    -    var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
    -
    -    elseValue.foreach { elseExpr =>
    +    val elseCode = elseValue.map { elseExpr =>
           val res = elseExpr.genCode(ctx)
    -      generatedCode +=
    -        s"""
    +      s"""
    +        if(!$conditionMet) {
               ${res.code}
               ${ev.isNull} = ${res.isNull};
               ${ev.value} = ${res.value};
    -        """
    +        }
    +      """
         }
     
    -    generatedCode += "}\n" * cases.size
    +    val allConditions = cases ++ elseCode
    +
    +    val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
    +        allConditions.mkString("\n")
    +      } else {
    +        ctx.splitExpressions(allConditions, "caseWhen",
    +          ("InternalRow", ctx.INPUT_ROW) :: (ctx.JAVA_BOOLEAN, conditionMet) :: Nil,
    --- End diff --
    
    After the latest changes, `conditionMet ` becomes a local variable.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19752
  
    **[Test build #84168 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84168/testReport)** for PR 19752 at commit [`6225c8e`](https://github.com/apache/spark/commit/6225c8ecb00bab4fa892e7847f5aa9bdee54409b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org