You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sitalkedia <gi...@git.apache.org> on 2016/05/13 20:29:54 UTC

[GitHub] spark pull request: [SPARK-13850] Force the sorter to Spill when n...

GitHub user sitalkedia opened a pull request:

    https://github.com/apache/spark/pull/13107

    [SPARK-13850] Force the sorter to Spill when number of elements in th…

    ## What changes were proposed in this pull request?
    
    Force the sorter to Spill when number of elements in the pointer array reach a certain size. This is to workaround the issue of timSort failing on large buffer size.
    
    ## How was this patch tested?
    
    Tested by running a job which was failing without this change due to TimSort bug.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sitalkedia/spark fix_TimSort

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13107.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13107
    
----
commit ab601e3383ebda351398f89528e104a7bfd8f37f
Author: Sital Kedia <sk...@fb.com>
Date:   2016-05-13T17:39:13Z

    [SPARK-13850] Force the sorter to Spill when number of elements in the pointer array reach a certain size. This is to workaround the issue of timSort failing on large buffer size.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13850] Force the sorter to Spill when n...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/13107#issuecomment-221638365
  
    @rxin - We have seen this issue go away when we limit the pointer array size within 1G. Updated the PR to set that as a default value.
    
    @davies - There is another issue with the allocation of temporary buffer you mentioned.  That buffer is not being managed by the MemoryManager and we often see executor OOM because of that. I have opened a JIRA (SPARK-15391) for this. Would be great to have that issue fixed.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13850] Force the sorter to Spill when n...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/13107#issuecomment-219612852
  
    What is the root cause? Can you also add a regression test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    @davies - Addressed all comments and fixed test cases. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13107: [SPARK-13850] Force the sorter to Spill when numb...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13107#discussion_r67390998
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
    @@ -72,7 +72,11 @@
       private final TaskContext taskContext;
       private final ShuffleWriteMetrics writeMetrics;
     
    -  /** Force this sorter to spill when there are this many elements in memory. For testing only */
    +  /**
    +   * Force this sorter to spill when there are this many elements in memory. This is to workaround
    +   * the issue of timSort failing on large buffer size (SPARK-13850). The default value is
    +   * 1024 * 1024 * 1024 / 8 which allows the maximum size of the pointer array to be 1G.
    --- End diff --
    
    Changed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13107: [SPARK-13850] Force the sorter to Spill when numb...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13107#discussion_r65928685
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
    @@ -113,7 +117,7 @@
         // Use getSizeAsKb (not bytes) to maintain backwards compatibility if no units are provided
         this.fileBufferSizeBytes = (int) conf.getSizeAsKb("spark.shuffle.file.buffer", "32k") * 1024;
         this.numElementsForSpillThreshold =
    -      conf.getLong("spark.shuffle.spill.numElementsForceSpillThreshold", Long.MAX_VALUE);
    +      conf.getLong("spark.shuffle.spill.numElementsForceSpillThreshold", (1024 * 1024 * 1024 / 8));
    --- End diff --
    
    numElementsForSpillThreshold should be less than `1024 * 1024 * 1024 / 2`, because radix sort could only use half of the memory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    **[Test build #61549 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61549/consoleFull)** for PR 13107 at commit [`11dc238`](https://github.com/apache/spark/commit/11dc23809e457d7f53dc69947e778ea13538aab0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13850] Force the sorter to Spill when n...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13107#discussion_r63266669
  
    --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java ---
    @@ -24,6 +24,8 @@
     import java.util.Queue;
     
     import com.google.common.annotations.VisibleForTesting;
    +import org.apache.spark.SparkConf;
    +import org.apache.spark.SparkEnv;
    --- End diff --
    
    (It seems imports might have to be reordered, https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Imports)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    @davies - Can you take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61484/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    **[Test build #61423 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61423/consoleFull)** for PR 13107 at commit [`cab4357`](https://github.com/apache/spark/commit/cab43577b53b8a9bb3562947256658389ffa6ad8).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13107: [SPARK-13850] Force the sorter to Spill when numb...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13107#discussion_r67391476
  
    --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java ---
    @@ -143,6 +151,8 @@ private UnsafeExternalSorter(
           this.inMemSorter = existingInMemorySorter;
         }
         this.peakMemoryUsedBytes = getMemoryUsage();
    +    this.numElementsForSpillThreshold =
    +      SparkEnv.get().conf().getLong("spark.shuffle.spill.numElementsForceSpillThreshold", (1024 * 1024 * 1024 / 8));
    --- End diff --
    
    it should be 1024 * 1024 * 1024 / 2 right? Its useful for us to have this configurable, because we have been seeing JVM OOMs and SEG faults when we are allocating ridiculously large buffer in memory. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61423/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    **[Test build #61549 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61549/consoleFull)** for PR 13107 at commit [`11dc238`](https://github.com/apache/spark/commit/11dc23809e457d7f53dc69947e778ea13538aab0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    **[Test build #61423 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61423/consoleFull)** for PR 13107 at commit [`cab4357`](https://github.com/apache/spark/commit/cab43577b53b8a9bb3562947256658389ffa6ad8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13107: [SPARK-13850] Force the sorter to Spill when numb...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13107#discussion_r68856272
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
    @@ -72,7 +72,10 @@
       private final TaskContext taskContext;
       private final ShuffleWriteMetrics writeMetrics;
     
    -  /** Force this sorter to spill when there are this many elements in memory. For testing only */
    +  /**
    +   * Force this sorter to spill when there are this many elements in memory. The default value is
    +   * 1024 * 1024 * 1024 / 2 which allows the maximum size of the pointer array to be 8G.
    --- End diff --
    
    I see, fixed that, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    @rxin - In addition to @davies's point. This feature also controls largest contiguous memory block allocated on heap which is very useful to avoid OOM when operating on large data set. We have been seeing this issue of executor OOM due to failure to allocate large amount of contiguous buffer in memory due to defragmentation. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13107: [SPARK-13850] Force the sorter to Spill when numb...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13107


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13107: [SPARK-13850] Force the sorter to Spill when numb...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13107#discussion_r65928165
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
    @@ -72,7 +72,11 @@
       private final TaskContext taskContext;
       private final ShuffleWriteMetrics writeMetrics;
     
    -  /** Force this sorter to spill when there are this many elements in memory. For testing only */
    +  /**
    +   * Force this sorter to spill when there are this many elements in memory. This is to workaround
    +   * the issue of timSort failing on large buffer size (SPARK-13850). The default value is
    +   * 1024 * 1024 * 1024 / 8 which allows the maximum size of the pointer array to be 1G.
    --- End diff --
    
    That bug is fixed, could you update the comment (or changing 1G to 8G)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    **[Test build #3133 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3133/consoleFull)** for PR 13107 at commit [`c5f5a69`](https://github.com/apache/spark/commit/c5f5a69413baa7e30abb4c2b5cf1d826429368d3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    **[Test build #3133 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3133/consoleFull)** for PR 13107 at commit [`c5f5a69`](https://github.com/apache/spark/commit/c5f5a69413baa7e30abb4c2b5cf1d826429368d3).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61549/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13850] Force the sorter to Spill when n...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13107#discussion_r63267469
  
    --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java ---
    @@ -143,6 +151,8 @@ private UnsafeExternalSorter(
           this.inMemSorter = existingInMemorySorter;
         }
         this.peakMemoryUsedBytes = getMemoryUsage();
    +    this.numElementsForSpillThreshold =
    +        SparkEnv.get().conf().getLong("spark.shuffle.spill.numElementsForceSpillThreshold", Long.MAX_VALUE);
    --- End diff --
    
    I guess this might have to be double-spaced.
    
    ```java
    this.numElementsForSpillThreshold =
      SparkEnv.get().conf().getLong("spark.shuffle.spill.numElementsForceSpillThreshold", Long.MAX_VALUE);
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    @sitalkedia I think it will trigger Full GC and eventually spilling in that case, could you provide more information on that (stacktrace or logging)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13850] Force the sorter to Spill when n...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13107#issuecomment-219151861
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    LGTM, 
    Merging this into master and 2.0, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13850] Force the sorter to Spill when n...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/13107#issuecomment-221991347
  
    Do we still need this pull request given the other two patches?
    
    https://github.com/apache/spark/pull/13336
    
    https://github.com/apache/spark/pull/13318


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    **[Test build #61484 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61484/consoleFull)** for PR 13107 at commit [`11dc238`](https://github.com/apache/spark/commit/11dc23809e457d7f53dc69947e778ea13538aab0).
     * This patch **fails from timeout after a configured wait of \`250m\`**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    **[Test build #61416 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61416/consoleFull)** for PR 13107 at commit [`ed657a3`](https://github.com/apache/spark/commit/ed657a3e8f69a68a7c5352c8672c879cd6de672c).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61372/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    jenkins retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61416/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    **[Test build #61416 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61416/consoleFull)** for PR 13107 at commit [`ed657a3`](https://github.com/apache/spark/commit/ed657a3e8f69a68a7c5352c8672c879cd6de672c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13850] Force the sorter to Spill when n...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/13107#issuecomment-219762996
  
    I am not 100% sure of the root cause, but I suspect this is happening when JVM is trying to allocate very large size buffer for pointer array. The issue might be because the JVM is not able to allocate large buffer in contiguous memory location on heap and since the unsafe operations assume contiguous memory location of the objects, any unsafe operation on large buffer results in memory corruption which manifests as TimSort issue. 
    
    Unfortunately, this issue is not reproducible consistently and I am not sure of the root cause. So I am not sure how can we have a regression test for it.
    
    Also, please note that this change itself is a no-op unless you override the default value of `numElementsForSpillThreshold`, which is `Long.MAX_VALUE`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13107: [SPARK-13850] Force the sorter to Spill when numb...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13107#discussion_r68620470
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
    @@ -72,7 +72,10 @@
       private final TaskContext taskContext;
       private final ShuffleWriteMetrics writeMetrics;
     
    -  /** Force this sorter to spill when there are this many elements in memory. For testing only */
    +  /**
    +   * Force this sorter to spill when there are this many elements in memory. The default value is
    +   * 1024 * 1024 * 1024 / 2 which allows the maximum size of the pointer array to be 8G.
    --- End diff --
    
    This one should be `1024 * 1024 * 1024` (8G)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    **[Test build #61372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61372/consoleFull)** for PR 13107 at commit [`c5f5a69`](https://github.com/apache/spark/commit/c5f5a69413baa7e30abb4c2b5cf1d826429368d3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13850] Force the sorter to Spill when n...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/13107#issuecomment-221426056
  
    Should we have a default value that's not Long.MAX_VALUE for this? What values do you guys typically set?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    **[Test build #61372 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61372/consoleFull)** for PR 13107 at commit [`c5f5a69`](https://github.com/apache/spark/commit/c5f5a69413baa7e30abb4c2b5cf1d826429368d3).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13850] Force the sorter to Spill when n...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/13107#issuecomment-219152163
  
    cc- @srowen, @JoshRosen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13850] Force the sorter to Spill when n...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/13107#issuecomment-221993686
  
    @rxin After fixing those two, we still have some other limits (the number of elements should be less than 512 mm), especially for on-heap mode. There are:
    1) the largest memory block is 8G in on-heap mode, so the number of elements should be less than 512M.
    2) for both Radix sort and time sort, the underlying array could not 2G (or overflow), the number of records should be less than 1G
    3) For sorted iterator, the underlying array could not be larger than 2G.
    
    So we still need to check the number of elements, then do spilling, or the job could fail in unexpected way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13107: [SPARK-13850] Force the sorter to Spill when numb...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13107#discussion_r65928927
  
    --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java ---
    @@ -143,6 +151,8 @@ private UnsafeExternalSorter(
           this.inMemSorter = existingInMemorySorter;
         }
         this.peakMemoryUsedBytes = getMemoryUsage();
    +    this.numElementsForSpillThreshold =
    +      SparkEnv.get().conf().getLong("spark.shuffle.spill.numElementsForceSpillThreshold", (1024 * 1024 * 1024 / 8));
    --- End diff --
    
    numElementsForSpillThreshold should be not greater than 1024 * 1024 * 1024 / 4, the element require 16 bytes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    Fixed test cases. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13107: [SPARK-13850] Force the sorter to Spill when number of e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13107
  
    **[Test build #61484 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61484/consoleFull)** for PR 13107 at commit [`11dc238`](https://github.com/apache/spark/commit/11dc23809e457d7f53dc69947e778ea13538aab0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13850] Force the sorter to Spill when n...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/13107#issuecomment-221476424
  
    TimSort require a temporary buffer to store the shorter part, which could be half of the size of pointer array in worst case. This depends on the original order of rows, it's pretty hard to reproduce. I hit that twice and have a patch, but can't reproduce it anymore (without the patch).
    
    The better solution should be only use 2/3 of the pointer array, left 1/3 as temporary buffer for TimSort.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13107: [SPARK-13850] Force the sorter to Spill when numb...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13107#discussion_r67391030
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
    @@ -113,7 +117,7 @@
         // Use getSizeAsKb (not bytes) to maintain backwards compatibility if no units are provided
         this.fileBufferSizeBytes = (int) conf.getSizeAsKb("spark.shuffle.file.buffer", "32k") * 1024;
         this.numElementsForSpillThreshold =
    -      conf.getLong("spark.shuffle.spill.numElementsForceSpillThreshold", Long.MAX_VALUE);
    +      conf.getLong("spark.shuffle.spill.numElementsForceSpillThreshold", (1024 * 1024 * 1024 / 8));
    --- End diff --
    
    changed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org