You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2017/11/12 18:02:58 UTC

[GitHub] spark pull request #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit pr...

GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/19729

    [SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem with least and greatest

    ## What changes were proposed in this pull request?
    
    This PR changes `least` and `greatest` code generation to place generated code for expression for arguments into separated methods if these size could be large.
    This PR resolved two cases:
    
    * `least` with a lot of argument
    * `greatest` with a lot of argument
    
    ## How was this patch tested?
    
    Added a new test case into `ArithmeticExpressionsSuite`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-22499

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19729.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19729
    
----
commit bc3f0a43956a1e46928f0269d1d8d0aad9ee02b1
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-11-12T18:00:08Z

    initial commit

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit pr...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19729


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem w...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19729
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83749/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem w...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19729
  
    **[Test build #83749 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83749/testReport)** for PR 19729 at commit [`bc3f0a4`](https://github.com/apache/spark/commit/bc3f0a43956a1e46928f0269d1d8d0aad9ee02b1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem w...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19729
  
    thanks, merging to master/2.2


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit pr...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19729#discussion_r150477798
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala ---
    @@ -604,6 +604,8 @@ case class Least(children: Seq[Expression]) extends Expression {
         val evalChildren = children.map(_.genCode(ctx))
         val first = evalChildren(0)
    --- End diff --
    
    I am sorry that I cannot understand your opinion.
    Could you please explain how your opinion works in the case that all of childrens with `ev.value = -2` and `ev.isNull = false` for `Greatest` by using `ctx.defaultValue`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit pr...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19729#discussion_r150419114
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala ---
    @@ -604,6 +604,8 @@ case class Least(children: Seq[Expression]) extends Expression {
         val evalChildren = children.map(_.genCode(ctx))
         val first = evalChildren(0)
    --- End diff --
    
    what about avoiding splitting `first` and `rest` and treat them all the same? Since the declaration of `ev.isNull` and `ev.value` is moved, they can all be treated the same way. In this way the code would be cleaner, wouldn't it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit pr...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19729#discussion_r150473862
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala ---
    @@ -604,6 +604,8 @@ case class Least(children: Seq[Expression]) extends Expression {
         val evalChildren = children.map(_.genCode(ctx))
         val first = evalChildren(0)
    --- End diff --
    
    the `value` is used only when `isNull` is `false`. And in that context a proper value has been set for `value`, thus the problem you are describing can't happen IMHO.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem w...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/19729
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem w...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19729
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem w...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19729
  
    **[Test build #83793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83793/testReport)** for PR 19729 at commit [`5e87935`](https://github.com/apache/spark/commit/5e87935789ba7c51f1858d1ef003e92536fc3f82).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit pr...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19729#discussion_r150419122
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala ---
    @@ -670,6 +673,8 @@ case class Greatest(children: Seq[Expression]) extends Expression {
         val evalChildren = children.map(_.genCode(ctx))
         val first = evalChildren(0)
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem w...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19729
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83793/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem w...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19729
  
    **[Test build #83793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83793/testReport)** for PR 19729 at commit [`5e87935`](https://github.com/apache/spark/commit/5e87935789ba7c51f1858d1ef003e92536fc3f82).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit pr...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19729#discussion_r150421423
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala ---
    @@ -604,6 +604,8 @@ case class Least(children: Seq[Expression]) extends Expression {
         val evalChildren = children.map(_.genCode(ctx))
         val first = evalChildren(0)
    --- End diff --
    
    I know what you did. This PR cannot follow the approach straightforward due to initial value. We can easily know `min`/`max` value for primitive type for an initial value. How can we know the initial value for complicated type?
    IMHO, the first element would be good for an initial value.
    
    What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit pr...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19729#discussion_r150420897
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala ---
    @@ -604,6 +604,8 @@ case class Least(children: Seq[Expression]) extends Expression {
         val evalChildren = children.map(_.genCode(ctx))
         val first = evalChildren(0)
    --- End diff --
    
    I am thinking about doing what I did in #19720: here we might follow the same approach. Also for consistency across the code. Indeed, there would be an `if` more, but I think that the overhead would be negligible.
    Or if we state that the right approach is this one to avoid one more `if`, I am best to change my approach in the PR I mentioned.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit pr...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19729#discussion_r150437176
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala ---
    @@ -604,6 +604,8 @@ case class Least(children: Seq[Expression]) extends Expression {
         val evalChildren = children.map(_.genCode(ctx))
         val first = evalChildren(0)
    --- End diff --
    
    [`defaultValue` function](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L596) returns `-1` or `null`.
    
    When we compare any non-primitive value with `null`, `NullPointerException` may occur.  
    Wehn we compare primitive value with initial value `-1`, the result may cause an incorrect result. For example, in `Greatest`, if all of the given values are `-2`, the result will be `-1`. It is incorrect.
    
    What do you think?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem w...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19729
  
    **[Test build #83749 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83749/testReport)** for PR 19729 at commit [`bc3f0a4`](https://github.com/apache/spark/commit/bc3f0a43956a1e46928f0269d1d8d0aad9ee02b1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit pr...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19729#discussion_r150542851
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala ---
    @@ -604,6 +604,8 @@ case class Least(children: Seq[Expression]) extends Expression {
         val evalChildren = children.map(_.genCode(ctx))
         val first = evalChildren(0)
    --- End diff --
    
    yes exactly, it doesn't matter how `ev.value` is initialized.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit pr...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19729#discussion_r150542028
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala ---
    @@ -604,6 +604,8 @@ case class Least(children: Seq[Expression]) extends Expression {
         val evalChildren = children.map(_.genCode(ctx))
         val first = evalChildren(0)
    --- End diff --
    
    Thank you for your explanation. Got it.  
    Your idea is to set `true` into `ev.isNull` rather than using `ctx.defaultValue` as an initial value for `ev.value`.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit pr...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19729#discussion_r150420093
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala ---
    @@ -604,6 +604,8 @@ case class Least(children: Seq[Expression]) extends Expression {
         val evalChildren = children.map(_.genCode(ctx))
         val first = evalChildren(0)
    --- End diff --
    
    Do you want to assign `evalChildren(0).value` into `ev.value` and then compare all of `evalChildren`? It can avoid `rest` for cleaner code. However, it increases one comparison.  
    Or, do you have another idea?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem w...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19729
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit pr...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19729#discussion_r150422088
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala ---
    @@ -604,6 +604,8 @@ case class Least(children: Seq[Expression]) extends Expression {
         val evalChildren = children.map(_.genCode(ctx))
         val first = evalChildren(0)
    --- End diff --
    
    I don't see any difference on the initial value. IMHO the initial value in my PR as well as here should be "null",  which means `ev. isNull` to `true` and the initial value for `ev.value` is irrelevant and it could be set to what is returned by the `defaultValue` function of the `CodegenContext`. Do you have any concern with this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19729: [SPARK-22499][SQL] Fix 64KB JVM bytecode limit pr...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19729#discussion_r150501441
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala ---
    @@ -604,6 +604,8 @@ case class Least(children: Seq[Expression]) extends Expression {
         val evalChildren = children.map(_.genCode(ctx))
         val first = evalChildren(0)
    --- End diff --
    
    before using `ev.value`, `ev.isNull` is checked (https://github.com/kiszk/spark/blob/bc3f0a43956a1e46928f0269d1d8d0aad9ee02b1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L612). So in case of all children with `-2`, the first one will see that `ev.isNull` is `true` and then it will set it to `false` and assign `-2` to `ev.value`. Thus the initial value assigned to `ev.value` is irrelevant, since `ev.isNull` is `true` and as far as `ev.isNull` is `true`, what `ev.value` contains doesn't matter.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org