You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liufengdb <gi...@git.apache.org> on 2017/10/09 20:04:44 UTC

[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...

GitHub user liufengdb opened a pull request:

    https://github.com/apache/spark/pull/19460

    [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder and add a test

    ## What changes were proposed in this pull request?
    
    We should not break the assumption that the length of the allocated byte array is word rounded:  
    https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java#L170
    So we want to use `Integer.MAX_VALUE - 15` instead of `Integer.MAX_VALUE - 8` as the upper bound of an allocated byte array. 
    
    ## How was this patch tested?
    
    Since the Spark unit test JVM has less than 1GB heap, here we run the test code as a submit job, so it can run on a JVM has 4GB memory. 
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liufengdb/spark fix_array_max

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19460.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19460
    
----
commit 92a6d2d53aea02042d47888e99df5a4f2167cd1f
Author: Feng Liu <fe...@databricks.com>
Date:   2017-10-09T17:43:39Z

    init

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19460
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19460
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19460
  
    **[Test build #82564 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82564/testReport)** for PR 19460 at commit [`09a1d5f`](https://github.com/apache/spark/commit/09a1d5fb689a979e5a48de7f90dfbc1f066bea86).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19460
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82572/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19460
  
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19460
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19460
  
    **[Test build #82572 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82572/testReport)** for PR 19460 at commit [`0f82f2d`](https://github.com/apache/spark/commit/0f82f2d5d49fc64e7a8ac4714900417a55ba72d1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19460
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19460
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19460#discussion_r143571046
  
    --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java ---
    @@ -40,6 +40,15 @@ public static int roundNumberOfBytesToNearestWord(int numBytes) {
         }
       }
     
    +  // Some JVMs can't allocate arrays of length Integer.MAX_VALUE; actual max is somewhat smaller.
    +  // Be conservative and lower the cap a little.
    +  public static int MAX_ARRAY_LENGTH = Integer.MAX_VALUE - 8;
    --- End diff --
    
    I get declaring a constant though it doesn't just pertain to byte arrays. I couldn't find a good place for it. I don't know if its reused so much that it needs this


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19460


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19460
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82567/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19460#discussion_r143564830
  
    --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java ---
    @@ -40,6 +40,15 @@ public static int roundNumberOfBytesToNearestWord(int numBytes) {
         }
       }
     
    +  // Some JVMs can't allocate arrays of length Integer.MAX_VALUE; actual max is somewhat smaller.
    +  // Be conservative and lower the cap a little.
    +  public static int MAX_ARRAY_LENGTH = Integer.MAX_VALUE - 8;
    --- End diff --
    
    `- 15`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19460
  
    **[Test build #82564 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82564/testReport)** for PR 19460 at commit [`09a1d5f`](https://github.com/apache/spark/commit/09a1d5fb689a979e5a48de7f90dfbc1f066bea86).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19460
  
    **[Test build #82562 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82562/testReport)** for PR 19460 at commit [`92a6d2d`](https://github.com/apache/spark/commit/92a6d2d53aea02042d47888e99df5a4f2167cd1f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19460
  
    **[Test build #82567 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82567/testReport)** for PR 19460 at commit [`90ecbcc`](https://github.com/apache/spark/commit/90ecbcc4f6909d7243a69014d5f76753fb451452).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19460
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19460#discussion_r143570916
  
    --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java ---
    @@ -40,6 +40,15 @@ public static int roundNumberOfBytesToNearestWord(int numBytes) {
         }
       }
     
    +  // Some JVMs can't allocate arrays of length Integer.MAX_VALUE; actual max is somewhat smaller.
    +  // Be conservative and lower the cap a little.
    +  public static int MAX_ARRAY_LENGTH = Integer.MAX_VALUE - 8;
    +
    +  // Use this value if the allocated byte arrays are used to store other types rather than bytes.
    +  public static int maxWordRoundedArrayLength() {
    +    return roundNumberOfBytesToNearestWord(MAX_ARRAY_LENGTH - 8);
    --- End diff --
    
    Rounding won't do; it has to round _down_. I see that's why you wrote -8 here but it's a little confusing as -7 works more directly. And then, why bother? just write -15 in the constant above and explain. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19460
  
    **[Test build #82562 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82562/testReport)** for PR 19460 at commit [`92a6d2d`](https://github.com/apache/spark/commit/92a6d2d53aea02042d47888e99df5a4f2167cd1f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19460#discussion_r143572060
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSuite.scala ---
    @@ -37,3 +43,54 @@ class BufferHolderSuite extends SparkFunSuite {
         assert(e.getMessage.contains("exceeds size limitation"))
       }
     }
    +
    +// A test for growing the buffer holder to nearly 2GB. Due to the heap size limitation of the Spark
    +// unit tests JVM, the actually test code is running as a submit job.
    +class BufferHolderSparkSubmitSuite
    --- End diff --
    
    Separate file?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19460
  
    **[Test build #82567 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82567/testReport)** for PR 19460 at commit [`90ecbcc`](https://github.com/apache/spark/commit/90ecbcc4f6909d7243a69014d5f76753fb451452).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19460
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82562/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19460
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82564/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19460#discussion_r143640733
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java ---
    @@ -36,9 +37,7 @@
      */
     public class BufferHolder {
     
    -  // Some JVMs can't allocate arrays of length Integer.MAX_VALUE; actual max is somewhat
    -  // smaller. Be conservative and lower the cap a little.
    -  private static final int ARRAY_MAX = Integer.MAX_VALUE - 8;
    +  private static final int ARRAY_MAX = ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH;
    --- End diff --
    
    Ok, these constants became redundant. No big deal. We don't need to rush this though


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHolder an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19460
  
    **[Test build #82572 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82572/testReport)** for PR 19460 at commit [`0f82f2d`](https://github.com/apache/spark/commit/0f82f2d5d49fc64e7a8ac4714900417a55ba72d1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org