You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2017/10/24 09:33:46 UTC
[GitHub] spark pull request #19563: Fix 64KB JVM bytecode limit problem in calculatin...
GitHub user kiszk opened a pull request:
https://github.com/apache/spark/pull/19563
Fix 64KB JVM bytecode limit problem in calculating hash for nested structs
## What changes were proposed in this pull request?
This PR avoids to generate a huge method for calculating a hash for nested structs.
Description will be updated with generated code.
## How was this patch tested?
Added new test to `HashExpressionsSuite`
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kiszk/spark SPARK-22284
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19563.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19563
----
commit bb9191fa0633487f4241e10f1c1d28763fc90ecc
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date: 2017-10-24T09:31:16Z
initial commit
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19563
**[Test build #83690 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83690/testReport)** for PR 19563 at commit [`7947ca2`](https://github.com/apache/spark/commit/7947ca2b78731f3110680111e6e4c35096a86752).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19563
**[Test build #83010 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83010/testReport)** for PR 19563 at commit [`bb9191f`](https://github.com/apache/spark/commit/bb9191fa0633487f4241e10f1c1d28763fc90ecc).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19563
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83082/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19563
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83017/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19563#discussion_r147573635
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala ---
@@ -389,9 +389,10 @@ abstract class HashExpression[E] extends Expression {
input: String,
result: String,
fields: Array[StructField]): String = {
- fields.zipWithIndex.map { case (field, index) =>
+ val hashes = fields.zipWithIndex.map { case (field, index) =>
nullSafeElementHash(input, index.toString, field.nullable, field.dataType, result, ctx)
- }.mkString("\n")
+ }
+ ctx.splitExpressions(hashes, "apply", ("InternalRow", input) :: Nil)
--- End diff --
Good catch, done
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19563#discussion_r147084523
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala ---
@@ -389,9 +389,15 @@ abstract class HashExpression[E] extends Expression {
input: String,
result: String,
fields: Array[StructField]): String = {
- fields.zipWithIndex.map { case (field, index) =>
+ val hashes = fields.zipWithIndex.map { case (field, index) =>
nullSafeElementHash(input, index.toString, field.nullable, field.dataType, result, ctx)
- }.mkString("\n")
+ }
+ val args = if (ctx.INPUT_ROW != null) {
+ Seq(("InternalRow", input), ("InternalRow", ctx.INPUT_ROW))
--- End diff --
sorry, I cannot understand why you need to pass `ctx.INPUT_ROW` as an argument, might you please explain me? Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19563
**[Test build #83010 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83010/testReport)** for PR 19563 at commit [`bb9191f`](https://github.com/apache/spark/commit/bb9191fa0633487f4241e10f1c1d28763fc90ecc).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19563
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19563#discussion_r150280850
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
@@ -639,6 +639,53 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
assert(hiveHashPlan(wideRow).getInt(0) == hiveHashEval)
}
+ test("SPARK-22284: Compute hash for nested structs") {
+ val M = 80
+ val N = 10
+ val L = M * N
+ val O = 50
+ val seed = 42
+
+ val wideRow1 = new GenericInternalRow(Seq.tabulate(O)(j =>
+ new GenericInternalRow(Seq.tabulate(L)(i =>
+ new GenericInternalRow(Array[Any](
+ UTF8String.fromString((j * L + i).toString))))
+ .toArray[Any])).toArray[Any])
+ val inner1 = new StructType(
+ (0 until L).map(_ => StructField("structOfString", structOfString)).toArray)
+ val schema1 = new StructType(
+ (0 until O).map(_ => StructField("structOfStructOfStrings", inner1)).toArray)
+ val exprs1 = schema1.fields.zipWithIndex.map { case (f, i) =>
+ BoundReference(i, f.dataType, true)
+ }
+ val murmur3HashExpr1 = Murmur3Hash(exprs1, seed)
+ val murmur3HashPlan1 = GenerateMutableProjection.generate(Seq(murmur3HashExpr1))
+
+ val murmursHashEval1 = Murmur3Hash(exprs1, seed).eval(wideRow1)
+ assert(murmur3HashPlan1(wideRow1).getInt(0) == murmursHashEval1)
+
+ val wideRow2 = new GenericInternalRow(Seq.tabulate(O)(k =>
--- End diff --
Sure, done
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19563
**[Test build #83185 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83185/testReport)** for PR 19563 at commit [`70c2304`](https://github.com/apache/spark/commit/70c2304393d6ec967a509fde8746e8268e7c80ee).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19563
**[Test build #83017 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83017/testReport)** for PR 19563 at commit [`67d1e58`](https://github.com/apache/spark/commit/67d1e58c5b2eaccda7a11d12e4321b798a223d0c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19563
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83185/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19563
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83010/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19563
**[Test build #83017 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83017/testReport)** for PR 19563 at commit [`67d1e58`](https://github.com/apache/spark/commit/67d1e58c5b2eaccda7a11d12e4321b798a223d0c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19563
ping @cloud-fan
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19563
thanks, merging to master/2.2!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19563
**[Test build #83690 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83690/testReport)** for PR 19563 at commit [`7947ca2`](https://github.com/apache/spark/commit/7947ca2b78731f3110680111e6e4c35096a86752).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19563
**[Test build #83082 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83082/testReport)** for PR 19563 at commit [`63c3a07`](https://github.com/apache/spark/commit/63c3a075ac76b5efe8ad4b53af646e08b9c7ef9c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19563
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83690/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19563#discussion_r147573624
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
@@ -639,6 +639,63 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
assert(hiveHashPlan(wideRow).getInt(0) == hiveHashEval)
}
+ test("SPARK-22284: Compute hash for nested structs") {
+ val M = 80
+ val N = 10
+ val L = M * N
+ val O = 50
+ val seed = 42
+
+ val wideRow1 = new GenericInternalRow(Seq.tabulate(O)(j =>
+ new GenericInternalRow(Seq.tabulate(L)(i =>
+ new GenericInternalRow(Array[Any](
+ UTF8String.fromString((j * L + i).toString))))
+ .toArray[Any])).toArray[Any])
+ var inner1 = new StructType()
--- End diff --
I see, done
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19563#discussion_r147129483
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala ---
@@ -389,9 +389,10 @@ abstract class HashExpression[E] extends Expression {
input: String,
result: String,
fields: Array[StructField]): String = {
- fields.zipWithIndex.map { case (field, index) =>
+ val hashes = fields.zipWithIndex.map { case (field, index) =>
nullSafeElementHash(input, index.toString, field.nullable, field.dataType, result, ctx)
- }.mkString("\n")
+ }
+ ctx.splitExpressions(hashes, "apply", ("InternalRow", input) :: Nil)
--- End diff --
then I think that here the best option would be `ctx.splitExpressions(input, hashes)` which contains additional safety checks and I think is easier.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19563
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19563
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19563
**[Test build #83082 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83082/testReport)** for PR 19563 at commit [`63c3a07`](https://github.com/apache/spark/commit/63c3a075ac76b5efe8ad4b53af646e08b9c7ef9c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19563#discussion_r147106199
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala ---
@@ -389,9 +389,15 @@ abstract class HashExpression[E] extends Expression {
input: String,
result: String,
fields: Array[StructField]): String = {
- fields.zipWithIndex.map { case (field, index) =>
+ val hashes = fields.zipWithIndex.map { case (field, index) =>
nullSafeElementHash(input, index.toString, field.nullable, field.dataType, result, ctx)
- }.mkString("\n")
+ }
+ val args = if (ctx.INPUT_ROW != null) {
+ Seq(("InternalRow", input), ("InternalRow", ctx.INPUT_ROW))
--- End diff --
Good question. I conservatively pass `ctx.INPUT_ROW`. When I revisit this question, I believe that elements in `struct` would not use `ctx.INPUT_ROW`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19563
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19563#discussion_r147145637
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
@@ -639,6 +639,63 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
assert(hiveHashPlan(wideRow).getInt(0) == hiveHashEval)
}
+ test("SPARK-22284: Compute hash for nested structs") {
+ val M = 80
+ val N = 10
+ val L = M * N
+ val O = 50
+ val seed = 42
+
+ val wideRow1 = new GenericInternalRow(Seq.tabulate(O)(j =>
+ new GenericInternalRow(Seq.tabulate(L)(i =>
+ new GenericInternalRow(Array[Any](
+ UTF8String.fromString((j * L + i).toString))))
+ .toArray[Any])).toArray[Any])
+ var inner1 = new StructType()
--- End diff --
what about avoiding the usage of `var` here and in the other places by passing a `Seq` of fields in the constructor?
The fields may be created using range generation and `map` instead of `for` loops.
I think in this way we would be more compliant to general functional Scala style, what do you think?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19563
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/19563
thanks for addressing the comments @kiszk , now it LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19563
**[Test build #83185 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83185/testReport)** for PR 19563 at commit [`70c2304`](https://github.com/apache/spark/commit/70c2304393d6ec967a509fde8746e8268e7c80ee).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19563
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19563#discussion_r150224414
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
@@ -639,6 +639,53 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
assert(hiveHashPlan(wideRow).getInt(0) == hiveHashEval)
}
+ test("SPARK-22284: Compute hash for nested structs") {
+ val M = 80
+ val N = 10
+ val L = M * N
+ val O = 50
+ val seed = 42
+
+ val wideRow1 = new GenericInternalRow(Seq.tabulate(O)(j =>
+ new GenericInternalRow(Seq.tabulate(L)(i =>
+ new GenericInternalRow(Array[Any](
+ UTF8String.fromString((j * L + i).toString))))
+ .toArray[Any])).toArray[Any])
+ val inner1 = new StructType(
+ (0 until L).map(_ => StructField("structOfString", structOfString)).toArray)
+ val schema1 = new StructType(
+ (0 until O).map(_ => StructField("structOfStructOfStrings", inner1)).toArray)
+ val exprs1 = schema1.fields.zipWithIndex.map { case (f, i) =>
+ BoundReference(i, f.dataType, true)
+ }
+ val murmur3HashExpr1 = Murmur3Hash(exprs1, seed)
+ val murmur3HashPlan1 = GenerateMutableProjection.generate(Seq(murmur3HashExpr1))
+
+ val murmursHashEval1 = Murmur3Hash(exprs1, seed).eval(wideRow1)
+ assert(murmur3HashPlan1(wideRow1).getInt(0) == murmursHashEval1)
+
+ val wideRow2 = new GenericInternalRow(Seq.tabulate(O)(k =>
--- End diff --
I think this case totally covers the previous case, can we just keep this and remove `wideRow1`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19563
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org