You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2017/11/24 16:36:00 UTC
[GitHub] spark pull request #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit pr...
GitHub user kiszk opened a pull request:
https://github.com/apache/spark/pull/19817
[SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem with FormatString
## What changes were proposed in this pull request?
This PR changes `FormatString` code generation to place generated code for expressions for arguments into separated methods if these size could be large.
This PR passes variable arguments by using an `Object` array.
## How was this patch tested?
Added new test cases into `StringExpressionSuite`
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kiszk/spark SPARK-22603
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19817.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19817
----
commit e5bcbc41933a2bf81201b9e7c753b3405b30e0db
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date: 2017-11-24T16:34:38Z
initial commit
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19817
**[Test build #84180 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84180/testReport)** for PR 19817 at commit [`84c8bf4`](https://github.com/apache/spark/commit/84c8bf4fa068e3e5247f3b69367f7edba09e6684).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit pr...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19817#discussion_r153065055
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ---
@@ -1372,19 +1372,30 @@ case class FormatString(children: Expression*) extends Expression with ImplicitC
val pattern = children.head.genCode(ctx)
val argListGen = children.tail.map(x => (x.dataType, x.genCode(ctx)))
- val argListCode = argListGen.map(_._2.code + "\n")
-
- val argListString = argListGen.foldLeft("")((s, v) => {
- val nullSafeString =
+ val argList = ctx.freshName("argLists")
+ val numArgLists = argListGen.length
+ val argListCode = argListGen.zipWithIndex.map { case(v, index) =>
+ val value =
if (ctx.boxedType(v._1) != ctx.javaType(v._1)) {
// Java primitives get boxed in order to allow null values.
s"(${v._2.isNull}) ? (${ctx.boxedType(v._1)}) null : " +
s"new ${ctx.boxedType(v._1)}(${v._2.value})"
} else {
s"(${v._2.isNull}) ? null : ${v._2.value}"
}
- s + "," + nullSafeString
- })
+ s"""
+ ${v._2.code}
+ $argList[$index] = $value;
+ """
+ }
+ val argListCodes = if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
+ ctx.splitExpressions(
+ expressions = argListCode,
+ funcName = "valueFormatString",
+ arguments = ("InternalRow", ctx.INPUT_ROW) :: ("Object[]", argList) :: Nil)
+ } else {
+ argListCode.mkString("\n")
+ }
--- End diff --
Thank you for your comment. After writing the above comment, I thought it would be good to create a separate PR.
Thus, I have just submitted a PR #19821 as WIP. That PR should wait for merging this PR and then I will do rebase for #19821.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19817
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit pr...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19817#discussion_r153012255
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ---
@@ -1372,19 +1372,26 @@ case class FormatString(children: Expression*) extends Expression with ImplicitC
val pattern = children.head.genCode(ctx)
val argListGen = children.tail.map(x => (x.dataType, x.genCode(ctx)))
- val argListCode = argListGen.map(_._2.code + "\n")
-
- val argListString = argListGen.foldLeft("")((s, v) => {
- val nullSafeString =
- if (ctx.boxedType(v._1) != ctx.javaType(v._1)) {
- // Java primitives get boxed in order to allow null values.
- s"(${v._2.isNull}) ? (${ctx.boxedType(v._1)}) null : " +
- s"new ${ctx.boxedType(v._1)}(${v._2.value})"
- } else {
- s"(${v._2.isNull}) ? null : ${v._2.value}"
- }
- s + "," + nullSafeString
- })
+ val argList = ctx.freshName("argLists")
+ val numArgLists = argListGen.length
+ val argListCode = argListGen.zipWithIndex.map { case(v, index) =>
+ v._2.code + s"\n$argList[$index] = " +
--- End diff --
would it be possible to use the syntax:
```
s"""
...
"""
```
as it is done in many other places? I think it would be more readable. What do you think?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit pr...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19817
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit pr...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19817#discussion_r153060800
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ---
@@ -1372,19 +1372,30 @@ case class FormatString(children: Expression*) extends Expression with ImplicitC
val pattern = children.head.genCode(ctx)
val argListGen = children.tail.map(x => (x.dataType, x.genCode(ctx)))
- val argListCode = argListGen.map(_._2.code + "\n")
-
- val argListString = argListGen.foldLeft("")((s, v) => {
- val nullSafeString =
+ val argList = ctx.freshName("argLists")
+ val numArgLists = argListGen.length
+ val argListCode = argListGen.zipWithIndex.map { case(v, index) =>
+ val value =
if (ctx.boxedType(v._1) != ctx.javaType(v._1)) {
// Java primitives get boxed in order to allow null values.
s"(${v._2.isNull}) ? (${ctx.boxedType(v._1)}) null : " +
s"new ${ctx.boxedType(v._1)}(${v._2.value})"
} else {
s"(${v._2.isNull}) ? null : ${v._2.value}"
}
- s + "," + nullSafeString
- })
+ s"""
+ ${v._2.code}
+ $argList[$index] = $value;
+ """
+ }
+ val argListCodes = if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
+ ctx.splitExpressions(
+ expressions = argListCode,
+ funcName = "valueFormatString",
+ arguments = ("InternalRow", ctx.INPUT_ROW) :: ("Object[]", argList) :: Nil)
+ } else {
+ argListCode.mkString("\n")
+ }
--- End diff --
Sure
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19817
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84173/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit pr...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19817#discussion_r153060489
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ---
@@ -1372,19 +1372,30 @@ case class FormatString(children: Expression*) extends Expression with ImplicitC
val pattern = children.head.genCode(ctx)
val argListGen = children.tail.map(x => (x.dataType, x.genCode(ctx)))
- val argListCode = argListGen.map(_._2.code + "\n")
-
- val argListString = argListGen.foldLeft("")((s, v) => {
- val nullSafeString =
+ val argList = ctx.freshName("argLists")
+ val numArgLists = argListGen.length
+ val argListCode = argListGen.zipWithIndex.map { case(v, index) =>
+ val value =
if (ctx.boxedType(v._1) != ctx.javaType(v._1)) {
// Java primitives get boxed in order to allow null values.
s"(${v._2.isNull}) ? (${ctx.boxedType(v._1)}) null : " +
s"new ${ctx.boxedType(v._1)}(${v._2.value})"
} else {
s"(${v._2.isNull}) ? null : ${v._2.value}"
}
- s + "," + nullSafeString
- })
+ s"""
+ ${v._2.code}
+ $argList[$index] = $value;
+ """
+ }
+ val argListCodes = if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
+ ctx.splitExpressions(
+ expressions = argListCode,
+ funcName = "valueFormatString",
+ arguments = ("InternalRow", ctx.INPUT_ROW) :: ("Object[]", argList) :: Nil)
+ } else {
+ argListCode.mkString("\n")
+ }
--- End diff --
Could you create a `splitExpressions` in `CodegenContext` for avoiding the duplicate codes, like
```Scala
if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
ctx.splitExpressions(...)
} else {
inputs.mkString("\n")
}
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19817
**[Test build #84173 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84173/testReport)** for PR 19817 at commit [`e5bcbc4`](https://github.com/apache/spark/commit/e5bcbc41933a2bf81201b9e7c753b3405b30e0db).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19817
**[Test build #84180 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84180/testReport)** for PR 19817 at commit [`84c8bf4`](https://github.com/apache/spark/commit/84c8bf4fa068e3e5247f3b69367f7edba09e6684).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19817
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/19817
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19817
**[Test build #84173 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84173/testReport)** for PR 19817 at commit [`e5bcbc4`](https://github.com/apache/spark/commit/e5bcbc41933a2bf81201b9e7c753b3405b30e0db).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19817
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84180/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit pr...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19817#discussion_r153063159
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ---
@@ -1372,19 +1372,30 @@ case class FormatString(children: Expression*) extends Expression with ImplicitC
val pattern = children.head.genCode(ctx)
val argListGen = children.tail.map(x => (x.dataType, x.genCode(ctx)))
- val argListCode = argListGen.map(_._2.code + "\n")
-
- val argListString = argListGen.foldLeft("")((s, v) => {
- val nullSafeString =
+ val argList = ctx.freshName("argLists")
+ val numArgLists = argListGen.length
+ val argListCode = argListGen.zipWithIndex.map { case(v, index) =>
+ val value =
if (ctx.boxedType(v._1) != ctx.javaType(v._1)) {
// Java primitives get boxed in order to allow null values.
s"(${v._2.isNull}) ? (${ctx.boxedType(v._1)}) null : " +
s"new ${ctx.boxedType(v._1)}(${v._2.value})"
} else {
s"(${v._2.isNull}) ? null : ${v._2.value}"
}
- s + "," + nullSafeString
- })
+ s"""
+ ${v._2.code}
+ $argList[$index] = $value;
+ """
+ }
+ val argListCodes = if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
+ ctx.splitExpressions(
+ expressions = argListCode,
+ funcName = "valueFormatString",
+ arguments = ("InternalRow", ctx.INPUT_ROW) :: ("Object[]", argList) :: Nil)
+ } else {
+ argListCode.mkString("\n")
+ }
--- End diff --
I wanted to submit a PR for that, but I was waiting for all the PRs related (this one should be the last one) to be merged. Let me know if I can do that or you are doing it. My suggestion would be: can't we just change the existing method to include this check?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19817: [SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem w...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19817
LGTM, merging to master/2.2!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org