You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liufengdb <gi...@git.apache.org> on 2017/10/09 20:04:44 UTC
[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...
GitHub user liufengdb opened a pull request:
https://github.com/apache/spark/pull/19460
[SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder and add a test
## What changes were proposed in this pull request?
We should not break the assumption that the length of the allocated byte array is word rounded:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java#L170
So we want to use `Integer.MAX_VALUE - 15` instead of `Integer.MAX_VALUE - 8` as the upper bound of an allocated byte array.
## How was this patch tested?
Since the Spark unit test JVM has less than 1GB heap, here we run the test code as a submit job, so it can run on a JVM has 4GB memory.
Please review http://spark.apache.org/contributing.html before opening a pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/liufengdb/spark fix_array_max
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19460.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19460
----
commit 92a6d2d53aea02042d47888e99df5a4f2167cd1f
Author: Feng Liu <fe...@databricks.com>
Date: 2017-10-09T17:43:39Z
init
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19460
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19460
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19460
**[Test build #82564 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82564/testReport)** for PR 19460 at commit [`09a1d5f`](https://github.com/apache/spark/commit/09a1d5fb689a979e5a48de7f90dfbc1f066bea86).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19460
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82572/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19460
Thanks! Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19460
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19460
**[Test build #82572 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82572/testReport)** for PR 19460 at commit [`0f82f2d`](https://github.com/apache/spark/commit/0f82f2d5d49fc64e7a8ac4714900417a55ba72d1).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19460
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19460
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/19460#discussion_r143571046
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java ---
@@ -40,6 +40,15 @@ public static int roundNumberOfBytesToNearestWord(int numBytes) {
}
}
+ // Some JVMs can't allocate arrays of length Integer.MAX_VALUE; actual max is somewhat smaller.
+ // Be conservative and lower the cap a little.
+ public static int MAX_ARRAY_LENGTH = Integer.MAX_VALUE - 8;
--- End diff --
I get declaring a constant though it doesn't just pertain to byte arrays. I couldn't find a good place for it. I don't know if its reused so much that it needs this
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19460
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19460
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82567/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19460#discussion_r143564830
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java ---
@@ -40,6 +40,15 @@ public static int roundNumberOfBytesToNearestWord(int numBytes) {
}
}
+ // Some JVMs can't allocate arrays of length Integer.MAX_VALUE; actual max is somewhat smaller.
+ // Be conservative and lower the cap a little.
+ public static int MAX_ARRAY_LENGTH = Integer.MAX_VALUE - 8;
--- End diff --
`- 15`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19460
**[Test build #82564 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82564/testReport)** for PR 19460 at commit [`09a1d5f`](https://github.com/apache/spark/commit/09a1d5fb689a979e5a48de7f90dfbc1f066bea86).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19460
**[Test build #82562 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82562/testReport)** for PR 19460 at commit [`92a6d2d`](https://github.com/apache/spark/commit/92a6d2d53aea02042d47888e99df5a4f2167cd1f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19460
**[Test build #82567 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82567/testReport)** for PR 19460 at commit [`90ecbcc`](https://github.com/apache/spark/commit/90ecbcc4f6909d7243a69014d5f76753fb451452).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19460
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/19460#discussion_r143570916
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java ---
@@ -40,6 +40,15 @@ public static int roundNumberOfBytesToNearestWord(int numBytes) {
}
}
+ // Some JVMs can't allocate arrays of length Integer.MAX_VALUE; actual max is somewhat smaller.
+ // Be conservative and lower the cap a little.
+ public static int MAX_ARRAY_LENGTH = Integer.MAX_VALUE - 8;
+
+ // Use this value if the allocated byte arrays are used to store other types rather than bytes.
+ public static int maxWordRoundedArrayLength() {
+ return roundNumberOfBytesToNearestWord(MAX_ARRAY_LENGTH - 8);
--- End diff --
Rounding won't do; it has to round _down_. I see that's why you wrote -8 here but it's a little confusing as -7 works more directly. And then, why bother? just write -15 in the constant above and explain.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19460
**[Test build #82562 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82562/testReport)** for PR 19460 at commit [`92a6d2d`](https://github.com/apache/spark/commit/92a6d2d53aea02042d47888e99df5a4f2167cd1f).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/19460#discussion_r143572060
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSuite.scala ---
@@ -37,3 +43,54 @@ class BufferHolderSuite extends SparkFunSuite {
assert(e.getMessage.contains("exceeds size limitation"))
}
}
+
+// A test for growing the buffer holder to nearly 2GB. Due to the heap size limitation of the Spark
+// unit tests JVM, the actually test code is running as a submit job.
+class BufferHolderSparkSubmitSuite
--- End diff --
Separate file?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19460
**[Test build #82567 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82567/testReport)** for PR 19460 at commit [`90ecbcc`](https://github.com/apache/spark/commit/90ecbcc4f6909d7243a69014d5f76753fb451452).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19460
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82562/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19460
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82564/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/19460#discussion_r143640733
--- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java ---
@@ -36,9 +37,7 @@
*/
public class BufferHolder {
- // Some JVMs can't allocate arrays of length Integer.MAX_VALUE; actual max is somewhat
- // smaller. Be conservative and lower the cap a little.
- private static final int ARRAY_MAX = Integer.MAX_VALUE - 8;
+ private static final int ARRAY_MAX = ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH;
--- End diff --
Ok, these constants became redundant. No big deal. We don't need to rush this though
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19460
**[Test build #82572 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82572/testReport)** for PR 19460 at commit [`0f82f2d`](https://github.com/apache/spark/commit/0f82f2d5d49fc64e7a8ac4714900417a55ba72d1).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org