You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2017/11/12 19:06:17 UTC
[GitHub] spark pull request #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit pr...
GitHub user kiszk opened a pull request:
https://github.com/apache/spark/pull/19730
[SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem with cast
## What changes were proposed in this pull request?
This PR changes `cast` code generation to place generated code for expression for fields of a structure into separated methods if these size could be large.
## How was this patch tested?
Added new test cases into `CastSuite`
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kiszk/spark SPARK-22500
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19730.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19730
----
commit 90136717ca01a7fe426963fffabe3516d506382d
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date: 2017-11-12T19:02:53Z
initial commit
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19730
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19730
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83750/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit pr...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19730#discussion_r150584342
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
@@ -1039,13 +1039,19 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
}
}
"""
- }.mkString("\n")
+ }
+ val fieldsEvalCodes = if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
--- End diff --
shouldn't be `ctx.currentVars != null`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit pr...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19730#discussion_r150609735
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
@@ -1039,13 +1039,19 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
}
}
"""
- }.mkString("\n")
+ }
+ val fieldsEvalCodes = if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
--- End diff --
If [`ctx.currentVars != null`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L768), we need to use `mkString("\n")`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19730
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19730
**[Test build #84064 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84064/testReport)** for PR 19730 at commit [`c46ef5c`](https://github.com/apache/spark/commit/c46ef5c90dd986d22c69df94507c490252b06ce5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit pr...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19730#discussion_r150419888
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
@@ -1015,7 +1015,9 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
}
val rowClass = classOf[GenericInternalRow].getName
val result = ctx.freshName("result")
+ ctx.addMutableState(s"$rowClass", result, "")
--- End diff --
I think that since we not assigning it, we might keep it as a local variable and pass it to the generated methods. In this way we can avoid to introduce new global variables. Have you tried that?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19730
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19730
Working for this. I met another problem. [This code](https://github.com/kiszk/spark/blob/83fef403b92a96a13421901d161a0df5e6a6d7b3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L855) creates a lot of global variable.
Since `UTF8String.IntWrapper` can be reused, I am writing the following code.
```
val wrapper = "intWrapper"
if (!ctx.mutableStates.exists(s => s._1 == wrapper)) {
ctx.addMutableState("UTF8String.IntWrapper", wrapper,
s"$wrapper = new UTF8String.IntWrapper();")
}
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19730
**[Test build #83790 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83790/testReport)** for PR 19730 at commit [`94dfde2`](https://github.com/apache/spark/commit/94dfde25a395cfb1f221ab2018030b12c7011069).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19730
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19730
**[Test build #83970 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83970/testReport)** for PR 19730 at commit [`83fef40`](https://github.com/apache/spark/commit/83fef403b92a96a13421901d161a0df5e6a6d7b3).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19730
**[Test build #84079 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84079/testReport)** for PR 19730 at commit [`574691f`](https://github.com/apache/spark/commit/574691f3fa048129591fb4bafe3bdb07ebf5517e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit pr...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19730
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19730
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19730
Yeah. We can fix them in this PR.
BTW, could you check all the other calls of `addMutableState` and fix them too?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit pr...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19730#discussion_r151725673
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
@@ -1039,13 +1039,19 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
}
}
"""
- }.mkString("\n")
+ }
+ val fieldsEvalCodes = if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
+ ctx.splitExpressions(fieldsEvalCode, "castStruct",
+ ("InternalRow", ctx.INPUT_ROW) :: (rowClass, result) :: ("InternalRow", tmpRow) :: Nil)
--- End diff --
I mean, we don't need to pass in `ctx.INPUT_ROW` to the split functions.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit pr...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19730#discussion_r151477316
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
@@ -1039,13 +1039,19 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
}
}
"""
- }.mkString("\n")
+ }
+ val fieldsEvalCodes = if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
+ ctx.splitExpressions(fieldsEvalCode, "castStruct",
+ ("InternalRow", ctx.INPUT_ROW) :: (rowClass, result) :: ("InternalRow", tmpRow) :: Nil)
--- End diff --
how about inner struct? We also need to pass in the `ctx.INPUT_ROW`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit pr...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19730#discussion_r150419908
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
@@ -1015,7 +1015,9 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
}
val rowClass = classOf[GenericInternalRow].getName
val result = ctx.freshName("result")
+ ctx.addMutableState(s"$rowClass", result, "")
val tmpRow = ctx.freshName("tmpRow")
+ ctx.addMutableState("InternalRow", tmpRow, "")
--- End diff --
ditto
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19730
**[Test build #84079 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84079/testReport)** for PR 19730 at commit [`574691f`](https://github.com/apache/spark/commit/574691f3fa048129591fb4bafe3bdb07ebf5517e).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19730
I think this is a different issue and should be fixed with another PR. @kiszk how about we change the test to cast int to long to avoid this issue?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit pr...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19730#discussion_r151723565
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala ---
@@ -827,4 +827,34 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper {
checkEvaluation(cast(Literal.create(input, from), to), input)
}
+
+ test("SPARK-22500: cast for struct should not generate codes beyond 64KB") {
+ val N = 1000
+
+ val from1 = new StructType(
+ (1 to N).map(i => StructField(s"s$i", StringType)).toArray)
+ val to1 = new StructType(
+ (1 to N).map(i => StructField(s"i$i", IntegerType)).toArray)
+ val input1 = Row.fromSeq((1 to N).map(i => i.toString))
+ val output1 = Row.fromSeq((1 to N))
+ checkEvaluation(cast(Literal.create(input1, from1), to1), output1)
+
+ val from2 = new StructType(
+ (1 to N).map(i => StructField(s"a$i", ArrayType(StringType, containsNull = false))).toArray)
--- End diff --
or just test this case.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19730
thanks, merging to master!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19730
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit pr...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19730#discussion_r151476367
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
@@ -1039,13 +1039,19 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
}
}
"""
- }.mkString("\n")
+ }
+ val fieldsEvalCodes = if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
+ ctx.splitExpressions(fieldsEvalCode, "castStruct",
+ ("InternalRow", ctx.INPUT_ROW) :: (rowClass, result) :: ("InternalRow", tmpRow) :: Nil)
--- End diff --
what about inner struct? I think we should use `evPrim` here instead of `ctx.INPUT_ROW`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19730
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84079/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit pr...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19730#discussion_r152222498
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala ---
@@ -827,4 +827,49 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper {
checkEvaluation(cast(Literal.create(input, from), to), input)
}
+
+ test("SPARK-22500: cast for struct should not generate codes beyond 64KB") {
+ val N = 1000
+ val M = 250
+
+ val from1 = new StructType(
+ (1 to N).map(i => StructField(s"s$i", StringType)).toArray)
+ val to1 = new StructType(
+ (1 to N).map(i => StructField(s"i$i", IntegerType)).toArray)
+ val input1 = Row.fromSeq((1 to N).map(i => i.toString))
+ val output1 = Row.fromSeq((1 to N))
+ checkEvaluation(cast(Literal.create(input1, from1), to1), output1)
+
+ val from2 = new StructType(
+ (1 to N).map(i => StructField(s"a$i", ArrayType(StringType, containsNull = false))).toArray)
+ val to2 = new StructType(
+ (1 to N).map(i => StructField(s"i$i", ArrayType(IntegerType, containsNull = true))).toArray)
+ val input2 = Row.fromSeq((1 to N).map(_ => Seq("456", "true", "78.9")))
+ val output2 = Row.fromSeq((1 to N).map(_ => Seq(456, null, 78)))
+ checkEvaluation(cast(Literal.create(input2, from2), to2), output2)
+
+ val from3 = new StructType(
+ (1 to N).map(i => StructField(s"s$i",
+ StructType(Seq(StructField("l$i", IntegerType, nullable = true))))).toArray)
+ val to3 = new StructType(
+ (1 to N).map(i => StructField(s"s$i",
+ StructType(Seq(StructField("l$i", LongType, nullable = true))))).toArray)
+ val input3 = Row.fromSeq((1 to N).map(i => Row(i)))
+ val output3 = Row.fromSeq((1 to N).map(i => Row(i.toLong)))
+ checkEvaluation(cast(Literal.create(input3, from3), to3), output3)
+
+ val fromInner = new StructType(
+ (1 to M).map(i => StructField(s"s$i", DoubleType)).toArray)
+ val toInner = new StructType(
+ (1 to M).map(i => StructField(s"i$i", IntegerType)).toArray)
+ val inputInner = Row.fromSeq((1 to M).map(i => i + 0.5))
+ val outputInner = Row.fromSeq((1 to M))
+ val fromOuter = new StructType(
+ (1 to M).map(i => StructField(s"s$i", fromInner)).toArray)
+ val toOuter = new StructType(
+ (1 to M).map(i => StructField(s"s$i", toInner)).toArray)
+ val inputOuter = Row.fromSeq((1 to M).map(_ => inputInner))
+ val outputOuter = Row.fromSeq((1 to M).map(_ => outputInner))
+ checkEvaluation(cast(Literal.create(inputOuter, fromOuter), toOuter), outputOuter)
--- End diff --
I think this case is good enough to cover all the above cases?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19730
ping @kiszk
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19730
**[Test build #83790 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83790/testReport)** for PR 19730 at commit [`94dfde2`](https://github.com/apache/spark/commit/94dfde25a395cfb1f221ab2018030b12c7011069).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit pr...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19730#discussion_r151509423
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
@@ -1039,13 +1039,19 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
}
}
"""
- }.mkString("\n")
+ }
+ val fieldsEvalCodes = if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
+ ctx.splitExpressions(fieldsEvalCode, "castStruct",
+ ("InternalRow", ctx.INPUT_ROW) :: (rowClass, result) :: ("InternalRow", tmpRow) :: Nil)
--- End diff --
Inner struct case in the following code works well.
```
val struct = Literal.create(
Row(
UTF8String.fromString("123.4"),
Seq("456", "true", "78.9"),
Row(7)),
StructType(Seq(
StructField("i", StringType, nullable = true),
StructField("a",
ArrayType(StringType, containsNull = false), nullable = true),
StructField("s",
StructType(Seq(
StructField("i", IntegerType, nullable = true)))))))
val ret = cast(struct, StructType(Seq(
StructField("d", DoubleType, nullable = true),
StructField("a",
ArrayType(IntegerType, containsNull = true), nullable = true),
StructField("s",
StructType(Seq(
StructField("l", LongType, nullable = true)))))))
assert(ret.resolved === true)
checkEvaluation(ret, Row(123.4, Seq(456, null, 78), Row(7L)))
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19730
**[Test build #83792 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83792/testReport)** for PR 19730 at commit [`39ffaa9`](https://github.com/apache/spark/commit/39ffaa9e59131dcd955d280c26434d931a61c834).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19730
**[Test build #83750 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83750/testReport)** for PR 19730 at commit [`9013671`](https://github.com/apache/spark/commit/90136717ca01a7fe426963fffabe3516d506382d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19730
**[Test build #84064 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84064/testReport)** for PR 19730 at commit [`c46ef5c`](https://github.com/apache/spark/commit/c46ef5c90dd986d22c69df94507c490252b06ce5).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19730
**[Test build #83792 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83792/testReport)** for PR 19730 at commit [`39ffaa9`](https://github.com/apache/spark/commit/39ffaa9e59131dcd955d280c26434d931a61c834).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19730
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83792/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19730
**[Test build #83970 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83970/testReport)** for PR 19730 at commit [`83fef40`](https://github.com/apache/spark/commit/83fef403b92a96a13421901d161a0df5e6a6d7b3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19730
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84064/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19730
**[Test build #83750 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83750/testReport)** for PR 19730 at commit [`9013671`](https://github.com/apache/spark/commit/90136717ca01a7fe426963fffabe3516d506382d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19730
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83970/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit pr...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19730#discussion_r151723430
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala ---
@@ -827,4 +827,34 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper {
checkEvaluation(cast(Literal.create(input, from), to), input)
}
+
+ test("SPARK-22500: cast for struct should not generate codes beyond 64KB") {
+ val N = 1000
+
+ val from1 = new StructType(
+ (1 to N).map(i => StructField(s"s$i", StringType)).toArray)
+ val to1 = new StructType(
+ (1 to N).map(i => StructField(s"i$i", IntegerType)).toArray)
+ val input1 = Row.fromSeq((1 to N).map(i => i.toString))
+ val output1 = Row.fromSeq((1 to N))
+ checkEvaluation(cast(Literal.create(input1, from1), to1), output1)
+
+ val from2 = new StructType(
+ (1 to N).map(i => StructField(s"a$i", ArrayType(StringType, containsNull = false))).toArray)
--- End diff --
I'd expect something like
```
val from2 = new StructType(
(1 to N).map(i => StructField(s"s$i", from1)).toArray)
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19730
@cloud-fan I see, I will create another PR to fix this global variable issue.
@gatorsmile I will check other calls.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19730: [SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19730
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83790/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org