You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2017/11/24 16:36:00 UTC

[GitHub] spark pull request #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit pr...

GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/19817

    [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem with FormatString

    ## What changes were proposed in this pull request?
    
    This PR changes `FormatString` code generation to place generated code for expressions for arguments into separated methods if these size could be large.  
    This PR passes variable arguments by using an `Object` array.
    
    ## How was this patch tested?
    
    Added new test cases into `StringExpressionSuite`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-22603

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19817.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19817
    
----
commit e5bcbc41933a2bf81201b9e7c753b3405b30e0db
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-11-24T16:34:38Z

    initial commit

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem w...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19817
  
    **[Test build #84180 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84180/testReport)** for PR 19817 at commit [`84c8bf4`](https://github.com/apache/spark/commit/84c8bf4fa068e3e5247f3b69367f7edba09e6684).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit pr...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19817#discussion_r153065055
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ---
    @@ -1372,19 +1372,30 @@ case class FormatString(children: Expression*) extends Expression with ImplicitC
         val pattern = children.head.genCode(ctx)
     
         val argListGen = children.tail.map(x => (x.dataType, x.genCode(ctx)))
    -    val argListCode = argListGen.map(_._2.code + "\n")
    -
    -    val argListString = argListGen.foldLeft("")((s, v) => {
    -      val nullSafeString =
    +    val argList = ctx.freshName("argLists")
    +    val numArgLists = argListGen.length
    +    val argListCode = argListGen.zipWithIndex.map { case(v, index) =>
    +      val value =
             if (ctx.boxedType(v._1) != ctx.javaType(v._1)) {
               // Java primitives get boxed in order to allow null values.
               s"(${v._2.isNull}) ? (${ctx.boxedType(v._1)}) null : " +
                 s"new ${ctx.boxedType(v._1)}(${v._2.value})"
             } else {
               s"(${v._2.isNull}) ? null : ${v._2.value}"
             }
    -      s + "," + nullSafeString
    -    })
    +      s"""
    +         ${v._2.code}
    +         $argList[$index] = $value;
    +       """
    +    }
    +    val argListCodes = if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
    +      ctx.splitExpressions(
    +        expressions = argListCode,
    +        funcName = "valueFormatString",
    +        arguments = ("InternalRow", ctx.INPUT_ROW) :: ("Object[]", argList) :: Nil)
    +    } else {
    +      argListCode.mkString("\n")
    +    }
    --- End diff --
    
    Thank you for your comment. After writing the above comment, I thought it would be good to create a separate PR.  
    Thus, I have just submitted a PR #19821 as WIP. That PR should wait for merging this PR and then I will do rebase for #19821.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem w...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19817
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit pr...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19817#discussion_r153012255
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ---
    @@ -1372,19 +1372,26 @@ case class FormatString(children: Expression*) extends Expression with ImplicitC
         val pattern = children.head.genCode(ctx)
     
         val argListGen = children.tail.map(x => (x.dataType, x.genCode(ctx)))
    -    val argListCode = argListGen.map(_._2.code + "\n")
    -
    -    val argListString = argListGen.foldLeft("")((s, v) => {
    -      val nullSafeString =
    -        if (ctx.boxedType(v._1) != ctx.javaType(v._1)) {
    -          // Java primitives get boxed in order to allow null values.
    -          s"(${v._2.isNull}) ? (${ctx.boxedType(v._1)}) null : " +
    -            s"new ${ctx.boxedType(v._1)}(${v._2.value})"
    -        } else {
    -          s"(${v._2.isNull}) ? null : ${v._2.value}"
    -        }
    -      s + "," + nullSafeString
    -    })
    +    val argList = ctx.freshName("argLists")
    +    val numArgLists = argListGen.length
    +    val argListCode = argListGen.zipWithIndex.map { case(v, index) =>
    +      v._2.code + s"\n$argList[$index] = " +
    --- End diff --
    
    would it be possible to use the syntax:
    ```
     s"""
      ...
      """
    ```
    as it is done in many other places? I think it would be more readable. What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit pr...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19817


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit pr...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19817#discussion_r153060800
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ---
    @@ -1372,19 +1372,30 @@ case class FormatString(children: Expression*) extends Expression with ImplicitC
         val pattern = children.head.genCode(ctx)
     
         val argListGen = children.tail.map(x => (x.dataType, x.genCode(ctx)))
    -    val argListCode = argListGen.map(_._2.code + "\n")
    -
    -    val argListString = argListGen.foldLeft("")((s, v) => {
    -      val nullSafeString =
    +    val argList = ctx.freshName("argLists")
    +    val numArgLists = argListGen.length
    +    val argListCode = argListGen.zipWithIndex.map { case(v, index) =>
    +      val value =
             if (ctx.boxedType(v._1) != ctx.javaType(v._1)) {
               // Java primitives get boxed in order to allow null values.
               s"(${v._2.isNull}) ? (${ctx.boxedType(v._1)}) null : " +
                 s"new ${ctx.boxedType(v._1)}(${v._2.value})"
             } else {
               s"(${v._2.isNull}) ? null : ${v._2.value}"
             }
    -      s + "," + nullSafeString
    -    })
    +      s"""
    +         ${v._2.code}
    +         $argList[$index] = $value;
    +       """
    +    }
    +    val argListCodes = if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
    +      ctx.splitExpressions(
    +        expressions = argListCode,
    +        funcName = "valueFormatString",
    +        arguments = ("InternalRow", ctx.INPUT_ROW) :: ("Object[]", argList) :: Nil)
    +    } else {
    +      argListCode.mkString("\n")
    +    }
    --- End diff --
    
    Sure


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem w...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19817
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84173/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit pr...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19817#discussion_r153060489
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ---
    @@ -1372,19 +1372,30 @@ case class FormatString(children: Expression*) extends Expression with ImplicitC
         val pattern = children.head.genCode(ctx)
     
         val argListGen = children.tail.map(x => (x.dataType, x.genCode(ctx)))
    -    val argListCode = argListGen.map(_._2.code + "\n")
    -
    -    val argListString = argListGen.foldLeft("")((s, v) => {
    -      val nullSafeString =
    +    val argList = ctx.freshName("argLists")
    +    val numArgLists = argListGen.length
    +    val argListCode = argListGen.zipWithIndex.map { case(v, index) =>
    +      val value =
             if (ctx.boxedType(v._1) != ctx.javaType(v._1)) {
               // Java primitives get boxed in order to allow null values.
               s"(${v._2.isNull}) ? (${ctx.boxedType(v._1)}) null : " +
                 s"new ${ctx.boxedType(v._1)}(${v._2.value})"
             } else {
               s"(${v._2.isNull}) ? null : ${v._2.value}"
             }
    -      s + "," + nullSafeString
    -    })
    +      s"""
    +         ${v._2.code}
    +         $argList[$index] = $value;
    +       """
    +    }
    +    val argListCodes = if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
    +      ctx.splitExpressions(
    +        expressions = argListCode,
    +        funcName = "valueFormatString",
    +        arguments = ("InternalRow", ctx.INPUT_ROW) :: ("Object[]", argList) :: Nil)
    +    } else {
    +      argListCode.mkString("\n")
    +    }
    --- End diff --
    
    Could you create a `splitExpressions` in `CodegenContext` for avoiding the duplicate codes, like 
    
    ```Scala
        if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
          ctx.splitExpressions(...)
        } else {
          inputs.mkString("\n")
        }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem w...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19817
  
    **[Test build #84173 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84173/testReport)** for PR 19817 at commit [`e5bcbc4`](https://github.com/apache/spark/commit/e5bcbc41933a2bf81201b9e7c753b3405b30e0db).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem w...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19817
  
    **[Test build #84180 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84180/testReport)** for PR 19817 at commit [`84c8bf4`](https://github.com/apache/spark/commit/84c8bf4fa068e3e5247f3b69367f7edba09e6684).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem w...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19817
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem w...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/19817
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem w...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19817
  
    **[Test build #84173 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84173/testReport)** for PR 19817 at commit [`e5bcbc4`](https://github.com/apache/spark/commit/e5bcbc41933a2bf81201b9e7c753b3405b30e0db).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem w...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19817
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84180/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit pr...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19817#discussion_r153063159
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ---
    @@ -1372,19 +1372,30 @@ case class FormatString(children: Expression*) extends Expression with ImplicitC
         val pattern = children.head.genCode(ctx)
     
         val argListGen = children.tail.map(x => (x.dataType, x.genCode(ctx)))
    -    val argListCode = argListGen.map(_._2.code + "\n")
    -
    -    val argListString = argListGen.foldLeft("")((s, v) => {
    -      val nullSafeString =
    +    val argList = ctx.freshName("argLists")
    +    val numArgLists = argListGen.length
    +    val argListCode = argListGen.zipWithIndex.map { case(v, index) =>
    +      val value =
             if (ctx.boxedType(v._1) != ctx.javaType(v._1)) {
               // Java primitives get boxed in order to allow null values.
               s"(${v._2.isNull}) ? (${ctx.boxedType(v._1)}) null : " +
                 s"new ${ctx.boxedType(v._1)}(${v._2.value})"
             } else {
               s"(${v._2.isNull}) ? null : ${v._2.value}"
             }
    -      s + "," + nullSafeString
    -    })
    +      s"""
    +         ${v._2.code}
    +         $argList[$index] = $value;
    +       """
    +    }
    +    val argListCodes = if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
    +      ctx.splitExpressions(
    +        expressions = argListCode,
    +        funcName = "valueFormatString",
    +        arguments = ("InternalRow", ctx.INPUT_ROW) :: ("Object[]", argList) :: Nil)
    +    } else {
    +      argListCode.mkString("\n")
    +    }
    --- End diff --
    
    I wanted to submit a PR for that, but I was waiting for all the PRs related (this one should be the last one) to be merged. Let me know if I can do that or you are doing it. My suggestion would be: can't we just change the existing method to include this check?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem w...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19817
  
    LGTM, merging to master/2.2!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org