You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2017/11/22 17:07:22 UTC

[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/19797

    [SPARK-22570][SQL] Avoid to create a lot of global variables to reuse an object in generated code

    ## What changes were proposed in this pull request?
    
    This PR reduces # of global variables in generated code by reusing an object that can be held in one global variable. When a lot of global variables were generated, the generated code may meet 64K constant pool limit.  
    This PR reduces # of generated global variables in the following three operations:
    * `Cast` with String to primitive byte/short/int/long
    * `RegExpReplace`
    * `CreateArray`
    
    I intentionally leave [this part](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L595-L603). This is because this variable keeps a class that is dynamically generated. In other word, it is not possible to reuse one class.
    
    ## How was this patch tested?
    
    Added test cases

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-22570

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19797.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19797
    
----
commit 84610fb306713dfbe82ca72569fac09257b082bb
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-11-22T16:47:45Z

    initial commit for cast

commit 535160e63be200a722c7efba14a3d0befec47358
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-11-22T16:48:08Z

    initial commit for regexpreplace

commit 8ba2f59e209436c05479918fa7c5c70d6333b4b4
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-11-22T16:48:30Z

    createarray

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r153973488
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala ---
    @@ -178,6 +179,16 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         checkEvaluation(nonNullExpr, "num-num", row1)
       }
     
    +  test("SPARK-22570: should not create a lot of instance variables") {
    --- End diff --
    
    nit: `RegExpReplace should not create a lot of global variables`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84314/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r154071513
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala ---
    @@ -845,4 +846,11 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper {
         val outputOuter = Row.fromSeq((1 to N).map(_ => outputInner))
         checkEvaluation(cast(Literal.create(inputOuter, fromOuter), toOuter), outputOuter)
       }
    +
    +  test("SPARK-22570: Cast should not create a lot of instance variables") {
    +    val ctx = new CodegenContext
    +    cast("1", IntegerType).genCode(ctx).code
    +    cast("2", LongType).genCode(ctx).code
    --- End diff --
    
    nit: no need to call `code`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    **[Test build #84115 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84115/testReport)** for PR 19797 at commit [`7a49c41`](https://github.com/apache/spark/commit/7a49c41a2eb1ae933fbe611b0830c0f3b76af2dc).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r152766978
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -851,9 +855,11 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
     
       private[this] def castToIntCode(from: DataType, ctx: CodegenContext): CastFunction = from match {
         case StringType =>
    -      val wrapper = ctx.freshName("wrapper")
    -      ctx.addMutableState("UTF8String.IntWrapper", wrapper,
    -        s"$wrapper = new UTF8String.IntWrapper();")
    +      val wrapper = "intWrapper"
    +      if (!ctx.mutableStates.exists(s => s._1 == wrapper)) {
    +        ctx.addMutableState("UTF8String.IntWrapper", wrapper,
    +          s"$wrapper = new UTF8String.IntWrapper();")
    +      }
           (c, evPrim, evNull) =>
             s"""
               if ($c.toInt($wrapper)) {
    --- End diff --
    
    shouldn't GC work very well in this case? My concern is, if there is no significant performance overhead, reusing global variables may be too hacky.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r153973737
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala ---
    @@ -164,6 +165,12 @@ class ComplexTypesSuite extends PlanTest{
         comparePlans(Optimizer execute query, expected)
       }
     
    +  test("SPARK-22570: should not create a lot of instance variables") {
    --- End diff --
    
    nit: `Cast should not create a lot of global variables`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r152825350
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -851,9 +855,11 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
     
       private[this] def castToIntCode(from: DataType, ctx: CodegenContext): CastFunction = from match {
         case StringType =>
    -      val wrapper = ctx.freshName("wrapper")
    -      ctx.addMutableState("UTF8String.IntWrapper", wrapper,
    -        s"$wrapper = new UTF8String.IntWrapper();")
    +      val wrapper = "intWrapper"
    +      if (!ctx.mutableStates.exists(s => s._1 == wrapper)) {
    +        ctx.addMutableState("UTF8String.IntWrapper", wrapper,
    +          s"$wrapper = new UTF8String.IntWrapper();")
    +      }
           (c, evPrim, evNull) =>
             s"""
               if ($c.toInt($wrapper)) {
    --- End diff --
    
    I see. If we do not reuse global variables, we can use only a local variable.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    **[Test build #84133 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84133/testReport)** for PR 19797 at commit [`39f1b6d`](https://github.com/apache/spark/commit/39f1b6dce88bd0f016e0f13563aebb0a9b494569).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r153383615
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala ---
    @@ -164,6 +165,28 @@ class ComplexTypesSuite extends PlanTest{
         comparePlans(Optimizer execute query, expected)
       }
     
    +  test("SPARK-22570: should not create a lot of instance variables") {
    +    val ctx = new CodegenContext
    --- End diff --
    
    something like this, we have a codegen context, and do codegen. After that, instead of compiling the code to make sure it doesn't fail, we can just check the size of `ctx.mutableStates` to confirm that we don't add global variables during codegen


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r154074153
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala ---
    @@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.expressions
     import org.apache.spark.SparkFunSuite
     import org.apache.spark.sql.AnalysisException
     import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.expressions.codegen.{CodeAndComment, CodegenContext, CodeGenerator}
    --- End diff --
    
    `CodeAndComment` and `CodeGenerator` are not used.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r152719657
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
    @@ -87,35 +87,41 @@ private [sql] object GenArrayData {
           elementType: DataType,
           elementsCode: Seq[ExprCode],
           isMapKey: Boolean): (String, Seq[String], String, String) = {
    -    val arrayName = ctx.freshName("array")
         val arrayDataName = ctx.freshName("arrayData")
         val numElements = elementsCode.length
     
         if (!ctx.isPrimitiveType(elementType)) {
    +      val arrayName = "arrayObject"
           val genericArrayClass = classOf[GenericArrayData].getName
    -      ctx.addMutableState("Object[]", arrayName,
    -        s"$arrayName = new Object[$numElements];")
    +      if (!ctx.mutableStates.exists(s => s._1 == arrayName)) {
    +        ctx.addMutableState("Object[]", arrayName)
    +      }
     
           val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
    -        val isNullAssignment = if (!isMapKey) {
    -          s"$arrayName[$i] = null;"
    +        val isNullAssignment = if (eval.isNull == "false") {
    +          ""
    --- End diff --
    
    This don't really simplify the code since you still have the `else` block below. It complicates the codes here actually. So I think it is better to be unchanged.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r154071162
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
    @@ -110,17 +108,21 @@ private [sql] object GenArrayData {
              }
            """
           }
    +      val assignmentString = ctx.splitExpressions(
    +        expressions = assignments,
    +        funcName = "apply",
    +        extraArguments = ("Object[]", arrayDataName) :: Nil)
     
    -      ("",
    -       assignments,
    +      (s"Object[] $arrayName = new Object[$numElements];",
    +       assignmentString,
            s"final ArrayData $arrayDataName = new $genericArrayClass($arrayName);",
    --- End diff --
    
    to be consistent, shall we also set `arrayName` to null here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r153838284
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
    @@ -111,11 +110,12 @@ private [sql] object GenArrayData {
            """
           }
     
    -      ("",
    +      (s"$arrayName = new Object[$numElements];",
    --- End diff --
    
    To make it local variable, I would like to use a new API at #19821.
    After merging #19821, I will commit new version.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r153973840
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala ---
    @@ -845,4 +845,24 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper {
         val outputOuter = Row.fromSeq((1 to N).map(_ => outputInner))
         checkEvaluation(cast(Literal.create(inputOuter, fromOuter), toOuter), outputOuter)
       }
    +
    +  test("SPARK-22570: should not create a lot of instance variables") {
    +    val N = 30000
    +
    +    val from1 = new StructType(
    +      (1 to N).map(i => StructField(s"s$i", StringType)).toArray)
    +    val to1 = new StructType(
    +      (1 to N).map(i => StructField(s"i$i", IntegerType)).toArray)
    +    val input1 = Row.fromSeq((1 to N).map(i => i.toString))
    +    val output1 = Row.fromSeq((1 to N))
    +    checkEvaluation(cast(Literal.create(input1, from1), to1), output1)
    +
    +    val from2 = new StructType(
    +      (1 to N).map(i => StructField(s"s$i", StringType)).toArray)
    +    val to2 = new StructType(
    +      (1 to N).map(i => StructField(s"i$i", LongType)).toArray)
    +    val input2 = Row.fromSeq((1 to N).map(i => i.toString))
    +    val output2 = Row.fromSeq((1 to N).map(i => i.toLong))
    +    checkEvaluation(cast(Literal.create(input2, from2), to2), output2)
    --- End diff --
    
    can we also simplify this test like the other 2?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    **[Test build #84314 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84314/testReport)** for PR 19797 at commit [`36a330d`](https://github.com/apache/spark/commit/36a330ddf98360196df45d041255b8b70fab5450).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r153972998
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
    @@ -63,7 +63,7 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
         val (preprocess, assigns, postprocess, arrayData) =
           GenArrayData.genCodeToCreateArrayData(ctx, et, evals, false)
         ev.copy(
    -      code = preprocess + ctx.splitExpressions(assigns) + postprocess,
    +      code = preprocess + assigns + postprocess,
    --- End diff --
    
    why?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r153973694
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala ---
    @@ -164,6 +165,12 @@ class ComplexTypesSuite extends PlanTest{
         comparePlans(Optimizer execute query, expected)
       }
     
    +  test("SPARK-22570: should not create a lot of instance variables") {
    +    val ctx = new CodegenContext
    +    (1 to 60000).map(i => CreateArray(Seq(Literal(s"$i"))).genCode(ctx).code)
    --- End diff --
    
    why loop?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    **[Test build #84324 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84324/testReport)** for PR 19797 at commit [`3b8459d`](https://github.com/apache/spark/commit/3b8459d4b50f2a923019c74c0800ceee2a9da7d5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r152719475
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -799,9 +799,11 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
     
       private[this] def castToByteCode(from: DataType, ctx: CodegenContext): CastFunction = from match {
         case StringType =>
    -      val wrapper = ctx.freshName("wrapper")
    -      ctx.addMutableState("UTF8String.IntWrapper", wrapper,
    -        s"$wrapper = new UTF8String.IntWrapper();")
    +      val wrapper = "intWrapper"
    +      if (!ctx.mutableStates.exists(s => s._1 == wrapper)) {
    +        ctx.addMutableState("UTF8String.IntWrapper", wrapper,
    +          s"$wrapper = new UTF8String.IntWrapper();")
    +      }
    --- End diff --
    
    Add a help method to `CodegenContext`? Like `reuseOrAddMutableState`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r152664308
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -851,9 +855,11 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
     
       private[this] def castToIntCode(from: DataType, ctx: CodegenContext): CastFunction = from match {
         case StringType =>
    -      val wrapper = ctx.freshName("wrapper")
    -      ctx.addMutableState("UTF8String.IntWrapper", wrapper,
    -        s"$wrapper = new UTF8String.IntWrapper();")
    +      val wrapper = "intWrapper"
    +      if (!ctx.mutableStates.exists(s => s._1 == wrapper)) {
    +        ctx.addMutableState("UTF8String.IntWrapper", wrapper,
    +          s"$wrapper = new UTF8String.IntWrapper();")
    +      }
           (c, evPrim, evNull) =>
             s"""
               if ($c.toInt($wrapper)) {
    --- End diff --
    
    what if we create a new wrapper every time?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r152866870
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
    @@ -87,14 +87,13 @@ private [sql] object GenArrayData {
           elementType: DataType,
           elementsCode: Seq[ExprCode],
           isMapKey: Boolean): (String, Seq[String], String, String) = {
    -    val arrayName = ctx.freshName("array")
         val arrayDataName = ctx.freshName("arrayData")
         val numElements = elementsCode.length
     
         if (!ctx.isPrimitiveType(elementType)) {
    +      val arrayName = "arrayObject"
           val genericArrayClass = classOf[GenericArrayData].getName
    -      ctx.addMutableState("Object[]", arrayName,
    -        s"$arrayName = new Object[$numElements];")
    +      ctx.reuseOrAddMutableState("Object[]", arrayName)
    --- End diff --
    
    sorry, I understand only now your comment there. Honestly I may have a partial vision of this then please correct me if I am wrong, but my preference is for local variable for two reasons:
     - in the case we have a lot of methods and we have inner classes containing the methods, using a variable of the outer class adds (at least) a constant pool entry, while using a local variable doesn't (as per my understanding);
     - since we reinitialize every time the object with new assignments, I think we are not saving any GC (we are reusing only the pointer basically, which I don't think is a big problem).
    What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    **[Test build #84337 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84337/testReport)** for PR 19797 at commit [`a455ac3`](https://github.com/apache/spark/commit/a455ac3bf5aa49a15ef80fc7807078eb41bbfe74).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84115/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84341/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r152712011
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -851,9 +855,11 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
     
       private[this] def castToIntCode(from: DataType, ctx: CodegenContext): CastFunction = from match {
         case StringType =>
    -      val wrapper = ctx.freshName("wrapper")
    -      ctx.addMutableState("UTF8String.IntWrapper", wrapper,
    -        s"$wrapper = new UTF8String.IntWrapper();")
    +      val wrapper = "intWrapper"
    +      if (!ctx.mutableStates.exists(s => s._1 == wrapper)) {
    +        ctx.addMutableState("UTF8String.IntWrapper", wrapper,
    +          s"$wrapper = new UTF8String.IntWrapper();")
    +      }
           (c, evPrim, evNull) =>
             s"""
               if ($c.toInt($wrapper)) {
    --- End diff --
    
    It would work well, too. We may have some overhead to create a small object `IntWrapper` and collect it at GC.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r153185862
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
    @@ -111,11 +110,12 @@ private [sql] object GenArrayData {
            """
           }
     
    -      ("",
    +      (s"$arrayName = new Object[$numElements];",
    --- End diff --
    
    we still create an object array every time here, which doesn't help with the GC. I think we can just create a local variable here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84113/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84324/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r154071572
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala ---
    @@ -164,6 +165,12 @@ class ComplexTypesSuite extends PlanTest{
         comparePlans(Optimizer execute query, expected)
       }
     
    +  test("SPARK-22570: CreateArray should not create a lot of global variables") {
    +    val ctx = new CodegenContext
    +    CreateArray(Seq(Literal(1))).genCode(ctx).code
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r153837838
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -173,6 +173,23 @@ class CodegenContext {
         mutableStates += ((javaType, variableName, initCode))
       }
     
    +  /**
    +   * Add a mutable state as a field to the generated class if the name has not been added
    +   *
    +   * @param javaType Java type of the field.
    +   * @param variableName Name of the field.
    +   * @param initCode The statement(s) to put into the init() method to initialize this field.
    +   *                 If left blank, the field will be default-initialized.
    +   */
    +  def reuseOrAddMutableState(
    --- End diff --
    
    I see. I will drop this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    **[Test build #84124 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84124/testReport)** for PR 19797 at commit [`4d9657a`](https://github.com/apache/spark/commit/4d9657ada452f8fee85e89818026cf15aea3aafc).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84124/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r153185397
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -173,6 +173,23 @@ class CodegenContext {
         mutableStates += ((javaType, variableName, initCode))
       }
     
    +  /**
    +   * Add a mutable state as a field to the generated class if the name has not been added
    +   *
    +   * @param javaType Java type of the field.
    +   * @param variableName Name of the field.
    +   * @param initCode The statement(s) to put into the init() method to initialize this field.
    +   *                 If left blank, the field will be default-initialized.
    +   */
    +  def reuseOrAddMutableState(
    --- End diff --
    
    let's not do this, it's too hacky and error-prone. We should just create the object every time, or create global variable every time.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84133/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    **[Test build #84341 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84341/testReport)** for PR 19797 at commit [`747e215`](https://github.com/apache/spark/commit/747e215a06856ae8a8ed49a784ae1327ec703966).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r152627225
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
    @@ -87,31 +87,36 @@ private [sql] object GenArrayData {
           elementType: DataType,
           elementsCode: Seq[ExprCode],
           isMapKey: Boolean): (String, Seq[String], String, String) = {
    -    val arrayName = ctx.freshName("array")
    +    val arrayName = "array"
         val arrayDataName = ctx.freshName("arrayData")
         val numElements = elementsCode.length
     
         if (!ctx.isPrimitiveType(elementType)) {
           val genericArrayClass = classOf[GenericArrayData].getName
    -      ctx.addMutableState("Object[]", arrayName,
    -        s"$arrayName = new Object[$numElements];")
    +      if (!ctx.mutableStates.exists(s => s._1 == arrayName)) {
    --- End diff --
    
    To be honest, I do not know whether to stop reusing an object is good or not.
    @cloud-fan @maropu @viirya I would like to hear your opinions.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r154073872
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala ---
    @@ -845,4 +846,11 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper {
         val outputOuter = Row.fromSeq((1 to N).map(_ => outputInner))
         checkEvaluation(cast(Literal.create(inputOuter, fromOuter), toOuter), outputOuter)
       }
    +
    +  test("SPARK-22570: Cast should not create a lot of instance variables") {
    --- End diff --
    
    nit: instance variables -> global variables. To match with other two added tests.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    **[Test build #84337 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84337/testReport)** for PR 19797 at commit [`a455ac3`](https://github.com/apache/spark/commit/a455ac3bf5aa49a15ef80fc7807078eb41bbfe74).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    **[Test build #84133 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84133/testReport)** for PR 19797 at commit [`39f1b6d`](https://github.com/apache/spark/commit/39f1b6dce88bd0f016e0f13563aebb0a9b494569).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    **[Test build #84341 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84341/testReport)** for PR 19797 at commit [`747e215`](https://github.com/apache/spark/commit/747e215a06856ae8a8ed49a784ae1327ec703966).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    **[Test build #84324 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84324/testReport)** for PR 19797 at commit [`3b8459d`](https://github.com/apache/spark/commit/3b8459d4b50f2a923019c74c0800ceee2a9da7d5).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19797


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84337/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r154074197
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala ---
    @@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.optimizer
     import org.apache.spark.sql.catalyst.dsl.expressions._
     import org.apache.spark.sql.catalyst.dsl.plans._
     import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.expressions.codegen.{CodeAndComment, CodegenContext, CodeGenerator}
    --- End diff --
    
    ditto.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    **[Test build #84115 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84115/testReport)** for PR 19797 at commit [`7a49c41`](https://github.com/apache/spark/commit/7a49c41a2eb1ae933fbe611b0830c0f3b76af2dc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r153187131
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala ---
    @@ -845,4 +845,24 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper {
         val outputOuter = Row.fromSeq((1 to N).map(_ => outputInner))
         checkEvaluation(cast(Literal.create(inputOuter, fromOuter), toOuter), outputOuter)
       }
    +
    +  test("SPARK-22570: should not create a lot of instance variables") {
    --- End diff --
    
    In the future we may fold global variables to arrays, and these tests become useless. I think a more reliable way is: check the number of global variables in `CodegenContext`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r152865836
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
    @@ -87,14 +87,13 @@ private [sql] object GenArrayData {
           elementType: DataType,
           elementsCode: Seq[ExprCode],
           isMapKey: Boolean): (String, Seq[String], String, String) = {
    -    val arrayName = ctx.freshName("array")
         val arrayDataName = ctx.freshName("arrayData")
         val numElements = elementsCode.length
     
         if (!ctx.isPrimitiveType(elementType)) {
    +      val arrayName = "arrayObject"
           val genericArrayClass = classOf[GenericArrayData].getName
    -      ctx.addMutableState("Object[]", arrayName,
    -        s"$arrayName = new Object[$numElements];")
    +      ctx.reuseOrAddMutableState("Object[]", arrayName)
    --- End diff --
    
    It is [my question](https://github.com/apache/spark/pull/19797#discussion_r152627225).
    Which one do you suggest?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    **[Test build #84314 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84314/testReport)** for PR 19797 at commit [`36a330d`](https://github.com/apache/spark/commit/36a330ddf98360196df45d041255b8b70fab5450).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r153380713
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala ---
    @@ -845,4 +845,24 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper {
         val outputOuter = Row.fromSeq((1 to N).map(_ => outputInner))
         checkEvaluation(cast(Literal.create(inputOuter, fromOuter), toOuter), outputOuter)
       }
    +
    +  test("SPARK-22570: should not create a lot of instance variables") {
    --- End diff --
    
    Could you elaborate this thought?  
    Do you want to this new check in test cases? Or do you mean we will remove these tests for global variables since we will have new check in another PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r153186151
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ---
    @@ -334,7 +334,7 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio
         ctx.addMutableState("String", termLastReplacement, s"${termLastReplacement} = null;")
         ctx.addMutableState("UTF8String",
           termLastReplacementInUTF8, s"${termLastReplacementInUTF8} = null;")
    -    ctx.addMutableState(classNameStringBuffer,
    +    ctx.reuseOrAddMutableState(classNameStringBuffer,
           termResult, s"${termResult} = new $classNameStringBuffer();")
    --- End diff --
    
    seems we can just create the string buffer every time, the overhead is small.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r153973665
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala ---
    @@ -178,6 +179,16 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         checkEvaluation(nonNullExpr, "num-num", row1)
       }
     
    +  test("SPARK-22570: should not create a lot of instance variables") {
    +    val N = 16000
    +    val expr = RegExpReplace(Literal("100"), Literal("(\\d+)"), Literal("num"))
    +    val ctx = new CodegenContext
    +    (1 to N).map(_ => expr.genCode(ctx).code)
    +    // four global variables (lastRegex, pattern, lastReplacement, and lastReplacementInUTF8)
    +    // are always required
    +    assert(ctx.mutableStates.length == 4 * N)
    --- End diff --
    
    I think we can simplify it to
    ```
    val ctx = new CodegenContext
    RegExpReplace(Literal("100"), Literal("(\\d+)"), Literal("num")).genCode(ctx)
    // ...
    assert(ctx.mutableStates.length == 4)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    **[Test build #84124 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84124/testReport)** for PR 19797 at commit [`4d9657a`](https://github.com/apache/spark/commit/4d9657ada452f8fee85e89818026cf15aea3aafc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r152719412
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -799,9 +799,11 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
     
       private[this] def castToByteCode(from: DataType, ctx: CodegenContext): CastFunction = from match {
         case StringType =>
    -      val wrapper = ctx.freshName("wrapper")
    -      ctx.addMutableState("UTF8String.IntWrapper", wrapper,
    -        s"$wrapper = new UTF8String.IntWrapper();")
    +      val wrapper = "intWrapper"
    --- End diff --
    
    Should we worry about name collision?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r152719853
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -799,9 +799,11 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
     
       private[this] def castToByteCode(from: DataType, ctx: CodegenContext): CastFunction = from match {
         case StringType =>
    -      val wrapper = ctx.freshName("wrapper")
    -      ctx.addMutableState("UTF8String.IntWrapper", wrapper,
    -        s"$wrapper = new UTF8String.IntWrapper();")
    +      val wrapper = "intWrapper"
    --- End diff --
    
    Do you worry about name collision among different methods? Since the lifetime of this object is very short, I intentionally use the same name for the same type (`IntWrapper` or `LongWrapper`) to reuse the object.
    We could use different names among different methods.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19797#discussion_r152864775
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ---
    @@ -87,14 +87,13 @@ private [sql] object GenArrayData {
           elementType: DataType,
           elementsCode: Seq[ExprCode],
           isMapKey: Boolean): (String, Seq[String], String, String) = {
    -    val arrayName = ctx.freshName("array")
         val arrayDataName = ctx.freshName("arrayData")
         val numElements = elementsCode.length
     
         if (!ctx.isPrimitiveType(elementType)) {
    +      val arrayName = "arrayObject"
           val genericArrayClass = classOf[GenericArrayData].getName
    -      ctx.addMutableState("Object[]", arrayName,
    -        s"$arrayName = new Object[$numElements];")
    +      ctx.reuseOrAddMutableState("Object[]", arrayName)
    --- End diff --
    
    is there any specific reason why here we are not following the approach used for `IntWrapper` and `LongWrapper` above? ie. defining it as a local variable, instead of reusing the same global object?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    **[Test build #84113 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84113/testReport)** for PR 19797 at commit [`8ba2f59`](https://github.com/apache/spark/commit/8ba2f59e209436c05479918fa7c5c70d6333b4b4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19797
  
    **[Test build #84113 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84113/testReport)** for PR 19797 at commit [`8ba2f59`](https://github.com/apache/spark/commit/8ba2f59e209436c05479918fa7c5c70d6333b4b4).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org