You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2017/10/24 09:33:46 UTC

[GitHub] spark pull request #19563: Fix 64KB JVM bytecode limit problem in calculatin...

GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/19563

    Fix 64KB JVM bytecode limit problem in calculating hash for nested structs

    ## What changes were proposed in this pull request?
    
    This PR avoids to generate a huge method for calculating a hash for nested structs.
    Description will be updated with generated code.
    
    ## How was this patch tested?
    
    Added new test to `HashExpressionsSuite`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-22284

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19563.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19563
    
----
commit bb9191fa0633487f4241e10f1c1d28763fc90ecc
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-10-24T09:31:16Z

    initial commit

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    **[Test build #83690 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83690/testReport)** for PR 19563 at commit [`7947ca2`](https://github.com/apache/spark/commit/7947ca2b78731f3110680111e6e4c35096a86752).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    **[Test build #83010 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83010/testReport)** for PR 19563 at commit [`bb9191f`](https://github.com/apache/spark/commit/bb9191fa0633487f4241e10f1c1d28763fc90ecc).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83082/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83017/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19563#discussion_r147573635
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala ---
    @@ -389,9 +389,10 @@ abstract class HashExpression[E] extends Expression {
           input: String,
           result: String,
           fields: Array[StructField]): String = {
    -    fields.zipWithIndex.map { case (field, index) =>
    +    val hashes = fields.zipWithIndex.map { case (field, index) =>
           nullSafeElementHash(input, index.toString, field.nullable, field.dataType, result, ctx)
    -    }.mkString("\n")
    +    }
    +    ctx.splitExpressions(hashes, "apply", ("InternalRow", input) :: Nil)
    --- End diff --
    
    Good catch, done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19563#discussion_r147084523
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala ---
    @@ -389,9 +389,15 @@ abstract class HashExpression[E] extends Expression {
           input: String,
           result: String,
           fields: Array[StructField]): String = {
    -    fields.zipWithIndex.map { case (field, index) =>
    +    val hashes = fields.zipWithIndex.map { case (field, index) =>
           nullSafeElementHash(input, index.toString, field.nullable, field.dataType, result, ctx)
    -    }.mkString("\n")
    +    }
    +    val args = if (ctx.INPUT_ROW != null) {
    +      Seq(("InternalRow", input), ("InternalRow", ctx.INPUT_ROW))
    --- End diff --
    
    sorry, I cannot understand why you need to pass `ctx.INPUT_ROW` as an argument, might you please explain me? Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    **[Test build #83010 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83010/testReport)** for PR 19563 at commit [`bb9191f`](https://github.com/apache/spark/commit/bb9191fa0633487f4241e10f1c1d28763fc90ecc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19563#discussion_r150280850
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
    @@ -639,6 +639,53 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         assert(hiveHashPlan(wideRow).getInt(0) == hiveHashEval)
       }
     
    +  test("SPARK-22284: Compute hash for nested structs") {
    +    val M = 80
    +    val N = 10
    +    val L = M * N
    +    val O = 50
    +    val seed = 42
    +
    +    val wideRow1 = new GenericInternalRow(Seq.tabulate(O)(j =>
    +      new GenericInternalRow(Seq.tabulate(L)(i =>
    +        new GenericInternalRow(Array[Any](
    +          UTF8String.fromString((j * L + i).toString))))
    +        .toArray[Any])).toArray[Any])
    +    val inner1 = new StructType(
    +      (0 until L).map(_ => StructField("structOfString", structOfString)).toArray)
    +    val schema1 = new StructType(
    +      (0 until O).map(_ => StructField("structOfStructOfStrings", inner1)).toArray)
    +    val exprs1 = schema1.fields.zipWithIndex.map { case (f, i) =>
    +      BoundReference(i, f.dataType, true)
    +    }
    +    val murmur3HashExpr1 = Murmur3Hash(exprs1, seed)
    +    val murmur3HashPlan1 = GenerateMutableProjection.generate(Seq(murmur3HashExpr1))
    +
    +    val murmursHashEval1 = Murmur3Hash(exprs1, seed).eval(wideRow1)
    +    assert(murmur3HashPlan1(wideRow1).getInt(0) == murmursHashEval1)
    +
    +    val wideRow2 = new GenericInternalRow(Seq.tabulate(O)(k =>
    --- End diff --
    
    Sure, done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    **[Test build #83185 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83185/testReport)** for PR 19563 at commit [`70c2304`](https://github.com/apache/spark/commit/70c2304393d6ec967a509fde8746e8268e7c80ee).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    **[Test build #83017 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83017/testReport)** for PR 19563 at commit [`67d1e58`](https://github.com/apache/spark/commit/67d1e58c5b2eaccda7a11d12e4321b798a223d0c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83185/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83010/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    **[Test build #83017 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83017/testReport)** for PR 19563 at commit [`67d1e58`](https://github.com/apache/spark/commit/67d1e58c5b2eaccda7a11d12e4321b798a223d0c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    ping @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    thanks, merging to master/2.2!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    **[Test build #83690 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83690/testReport)** for PR 19563 at commit [`7947ca2`](https://github.com/apache/spark/commit/7947ca2b78731f3110680111e6e4c35096a86752).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    **[Test build #83082 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83082/testReport)** for PR 19563 at commit [`63c3a07`](https://github.com/apache/spark/commit/63c3a075ac76b5efe8ad4b53af646e08b9c7ef9c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83690/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19563#discussion_r147573624
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
    @@ -639,6 +639,63 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         assert(hiveHashPlan(wideRow).getInt(0) == hiveHashEval)
       }
     
    +  test("SPARK-22284: Compute hash for nested structs") {
    +    val M = 80
    +    val N = 10
    +    val L = M * N
    +    val O = 50
    +    val seed = 42
    +
    +    val wideRow1 = new GenericInternalRow(Seq.tabulate(O)(j =>
    +        new GenericInternalRow(Seq.tabulate(L)(i =>
    +          new GenericInternalRow(Array[Any](
    +            UTF8String.fromString((j * L + i).toString))))
    +          .toArray[Any])).toArray[Any])
    +    var inner1 = new StructType()
    --- End diff --
    
    I see, done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19563#discussion_r147129483
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala ---
    @@ -389,9 +389,10 @@ abstract class HashExpression[E] extends Expression {
           input: String,
           result: String,
           fields: Array[StructField]): String = {
    -    fields.zipWithIndex.map { case (field, index) =>
    +    val hashes = fields.zipWithIndex.map { case (field, index) =>
           nullSafeElementHash(input, index.toString, field.nullable, field.dataType, result, ctx)
    -    }.mkString("\n")
    +    }
    +    ctx.splitExpressions(hashes, "apply", ("InternalRow", input) :: Nil)
    --- End diff --
    
    then I think that here the best option would be `ctx.splitExpressions(input, hashes)` which contains additional safety checks and I think is easier.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19563


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    **[Test build #83082 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83082/testReport)** for PR 19563 at commit [`63c3a07`](https://github.com/apache/spark/commit/63c3a075ac76b5efe8ad4b53af646e08b9c7ef9c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19563#discussion_r147106199
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala ---
    @@ -389,9 +389,15 @@ abstract class HashExpression[E] extends Expression {
           input: String,
           result: String,
           fields: Array[StructField]): String = {
    -    fields.zipWithIndex.map { case (field, index) =>
    +    val hashes = fields.zipWithIndex.map { case (field, index) =>
           nullSafeElementHash(input, index.toString, field.nullable, field.dataType, result, ctx)
    -    }.mkString("\n")
    +    }
    +    val args = if (ctx.INPUT_ROW != null) {
    +      Seq(("InternalRow", input), ("InternalRow", ctx.INPUT_ROW))
    --- End diff --
    
    Good question. I conservatively pass `ctx.INPUT_ROW`. When I revisit this question, I believe that elements in `struct` would not use `ctx.INPUT_ROW`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19563#discussion_r147145637
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
    @@ -639,6 +639,63 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         assert(hiveHashPlan(wideRow).getInt(0) == hiveHashEval)
       }
     
    +  test("SPARK-22284: Compute hash for nested structs") {
    +    val M = 80
    +    val N = 10
    +    val L = M * N
    +    val O = 50
    +    val seed = 42
    +
    +    val wideRow1 = new GenericInternalRow(Seq.tabulate(O)(j =>
    +        new GenericInternalRow(Seq.tabulate(L)(i =>
    +          new GenericInternalRow(Array[Any](
    +            UTF8String.fromString((j * L + i).toString))))
    +          .toArray[Any])).toArray[Any])
    +    var inner1 = new StructType()
    --- End diff --
    
    what about avoiding the usage of `var` here and in the other places by passing a `Seq` of fields in the constructor?
    The fields may be created using range generation and `map` instead of `for` loops.
    I think in this way we would be more compliant to general functional Scala style, what do you think? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    thanks for addressing the comments @kiszk , now it LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    **[Test build #83185 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83185/testReport)** for PR 19563 at commit [`70c2304`](https://github.com/apache/spark/commit/70c2304393d6ec967a509fde8746e8268e7c80ee).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19563#discussion_r150224414
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
    @@ -639,6 +639,53 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         assert(hiveHashPlan(wideRow).getInt(0) == hiveHashEval)
       }
     
    +  test("SPARK-22284: Compute hash for nested structs") {
    +    val M = 80
    +    val N = 10
    +    val L = M * N
    +    val O = 50
    +    val seed = 42
    +
    +    val wideRow1 = new GenericInternalRow(Seq.tabulate(O)(j =>
    +      new GenericInternalRow(Seq.tabulate(L)(i =>
    +        new GenericInternalRow(Array[Any](
    +          UTF8String.fromString((j * L + i).toString))))
    +        .toArray[Any])).toArray[Any])
    +    val inner1 = new StructType(
    +      (0 until L).map(_ => StructField("structOfString", structOfString)).toArray)
    +    val schema1 = new StructType(
    +      (0 until O).map(_ => StructField("structOfStructOfStrings", inner1)).toArray)
    +    val exprs1 = schema1.fields.zipWithIndex.map { case (f, i) =>
    +      BoundReference(i, f.dataType, true)
    +    }
    +    val murmur3HashExpr1 = Murmur3Hash(exprs1, seed)
    +    val murmur3HashPlan1 = GenerateMutableProjection.generate(Seq(murmur3HashExpr1))
    +
    +    val murmursHashEval1 = Murmur3Hash(exprs1, seed).eval(wideRow1)
    +    assert(murmur3HashPlan1(wideRow1).getInt(0) == murmursHashEval1)
    +
    +    val wideRow2 = new GenericInternalRow(Seq.tabulate(O)(k =>
    --- End diff --
    
    I think this case totally covers the previous case, can we just keep this and remove `wideRow1`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19563
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org