You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sitalkedia <gi...@git.apache.org> on 2016/04/10 10:09:43 UTC

[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

GitHub user sitalkedia opened a pull request:

    https://github.com/apache/spark/pull/12285

    [SPARK-14363] Fix executor OOM due to memory leak in the Sorter

    ## What changes were proposed in this pull request?
    
    Fix memory leak in the Sorter. When the UnsafeExternalSorter spills the data to disk, it does not free up the underlying pointer array. As a result, we see a lot of executor OOM and also memory under utilization.
    This is a regression partially introduced in PR https://github.com/apache/spark/pull/9241
    
    ## How was this patch tested?
    
    Tested by running a job and observed around 30% speedup after this change.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sitalkedia/spark executor_oom

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12285.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12285
    
----
commit 69ca097aba076013a0b4e1f25f8da7b5568c4d55
Author: Sital Kedia <sk...@fb.com>
Date:   2016-04-10T08:05:53Z

    [SPARK-14363] Fix executor OOM due to memory leak in the Sorter

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-209086388
  
    **[Test build #55627 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55627/consoleFull)** for PR 12285 at commit [`b102c25`](https://github.com/apache/spark/commit/b102c257661b7116637cdfa128a66211ce3f52da).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12285#discussion_r59324765
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java ---
    @@ -69,7 +72,11 @@ public int numRecords() {
         return pos;
       }
     
    -  public void reset() {
    +  public void freeMemoryAndResetPointerArray() {
    --- End diff --
    
    good idea, will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-207942129
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12285


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-208648111
  
    **[Test build #55561 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55561/consoleFull)** for PR 12285 at commit [`c318a35`](https://github.com/apache/spark/commit/c318a358ac712bca81bb38c9745a55dfac08b931).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-208594887
  
    **[Test build #55544 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55544/consoleFull)** for PR 12285 at commit [`707c9bc`](https://github.com/apache/spark/commit/707c9bc7a91ae265d33f3e7643b17b96b813b0bf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-209095552
  
    @davies - no issues, I will change it back. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12285#discussion_r59453967
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
    @@ -255,6 +253,10 @@ public long spill(long size, MemoryConsumer trigger) throws IOException {
     
         writeSortedFile(false);
         final long spillSize = freeMemory();
    +    inMemSorter.reset();
    +    // Reset the in-memory sorter's pointer array only after freeing up the memory pages holding the
    --- End diff --
    
    We can move this comment into reset()


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-209115364
  
    LGTM, will merging once passing the tests. Thanks for working on it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-208638656
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55544/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12285#discussion_r59324092
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java ---
    @@ -69,7 +72,11 @@ public int numRecords() {
         return pos;
       }
     
    -  public void reset() {
    +  public void freeMemoryAndResetPointerArray() {
    --- End diff --
    
    Or call it shrinkMemory() and return the size of freed memory?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-209136475
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-209136189
  
    **[Test build #55646 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55646/consoleFull)** for PR 12285 at commit [`d89adf8`](https://github.com/apache/spark/commit/d89adf8e514afb502b8adbac050a32c1672549ff).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-208671315
  
    @sitalkedia I think this is not a memory leak, it just does not release the memory as soon as possible. What does you plan looks like?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12285#discussion_r59454598
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
    @@ -255,6 +253,10 @@ public long spill(long size, MemoryConsumer trigger) throws IOException {
     
         writeSortedFile(false);
         final long spillSize = freeMemory();
    +    inMemSorter.reset();
    --- End diff --
    
    Yes, we need to reset the pointer array only after freeing up the memory pages holding records. Otherwise it might happen that the task might not get memory for the pointer array if it is already holding a lot of memory. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12285#discussion_r59325160
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java ---
    @@ -69,7 +72,11 @@ public int numRecords() {
         return pos;
       }
     
    -  public void reset() {
    +  public void freeMemoryAndResetPointerArray() {
    +    if (consumer != null) {
    +      consumer.freeArray(array);
    +      this.array = consumer.allocateArray(initialSize);
    --- End diff --
    
    This should never fail because at this point, the task should not be holding any memory and we are just allocating the pointer array of initial size, right? 
    
    Also, `allocateArray` will internally call `allocatePage` of the  `TaskMemoryManager` which grabs the lock anyway. So grabbing that lock here wont make sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-208594142
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-208648904
  
    **[Test build #55561 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55561/consoleFull)** for PR 12285 at commit [`c318a35`](https://github.com/apache/spark/commit/c318a358ac712bca81bb38c9745a55dfac08b931).
     * This patch **fails build dependency tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-208718277
  
    That make sense, thanks for the explanation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12285#discussion_r59327396
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java ---
    @@ -69,7 +72,11 @@ public int numRecords() {
         return pos;
       }
     
    -  public void reset() {
    +  public void freeMemoryAndResetPointerArray() {
    +    if (consumer != null) {
    +      consumer.freeArray(array);
    +      this.array = consumer.allocateArray(initialSize);
    --- End diff --
    
    @davies - Thanks for your input. Even if another task grabs all the memory freed by the `consumer.freeArray(array)`,  `consumer.allocateArray(initialSize)` should always succeed because the `MemoryManager` guarantees that each task is allocated atleast 1/2N of the total shuffle memory, right? 
    
    Manually releasing the array, bypassing the `Memorymanager` does not sound like a very clean approach to me. Also how would I do this `consumer.allocateArray(initialSize)` bypassing the memory manager?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12285: [SPARK-14363] Fix executor OOM due to memory leak in the...

Posted by notflorian <gi...@git.apache.org>.
Github user notflorian commented on the issue:

    https://github.com/apache/spark/pull/12285
  
    Sorry to camp this old issue.
    
    With spark 1.6.1, I've a lot of executor OOM when I do a rdd.saveAsHadoopFile, with parameter classOf[GzipCodec], to gzip the outpout files. 
    
    Without the GzipCodec parameter it works fine.
    
    Do you think the root cause of my issue is also this memory leak in the Sorter?
    
    Thanks a lot for your help.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-208710183
  
    In your case, inside sorting, the key has 4 columns, the row has 6 columns, so each pair will need about 90 bytes, the array used by sort needs 16 bytes, so the memory used by array should be 15% of all memory used by execution. In worse case, free the array finally could waste about 15% of the memory, how can it make that big difference?
    
    If your data set is huge, which requires spill multiple times, the size of spilled data should be similar each time, so the required array should be similar. If we free that finally, we don't need to grow the array in the middle or two spills (the grow require 50% more memory for array), that's the reason I changed to free the array finally.
    
    The reason your job will OOM is that  the memory used by Hive UDAF `UDAFCollectMap` is not managed by Spark, the better solution could be reduce the memory faction for Spark to leave more memory for UDAFCollectMap. After this patch, you may still see OOM, if UDAFCollectMap use even more memory.
    
    I agreed that the current patch is good (try to free memory as much as it can). Just try to understand more, please correct me if something is wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-209086857
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-209115656
  
    Thanks a lot for your quick review and response :). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-208649861
  
    @andrewor14 - Seems like some transient Jenkins failure. Can we rerun the test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12285#discussion_r59324780
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
    @@ -255,6 +253,10 @@ public long spill(long size, MemoryConsumer trigger) throws IOException {
     
         writeSortedFile(false);
         final long spillSize = freeMemory();
    +    // Reset the in-memory sorter's pointer array only after freeing up the memory pages holding the
    +    // records. Otherwise, if the task is over allocated memory, then without freeing the memory pages,
    +    // we might not be able to get memory for the pointer array.
    +    inMemSorter.freeMemoryAndResetPointerArray();
         taskContext.taskMetrics().incMemoryBytesSpilled(spillSize);
    --- End diff --
    
    will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-208638218
  
    **[Test build #55544 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55544/consoleFull)** for PR 12285 at commit [`707c9bc`](https://github.com/apache/spark/commit/707c9bc7a91ae265d33f3e7643b17b96b813b0bf).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12285#discussion_r59328730
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java ---
    @@ -69,7 +72,11 @@ public int numRecords() {
         return pos;
       }
     
    -  public void reset() {
    +  public void freeMemoryAndResetPointerArray() {
    +    if (consumer != null) {
    +      consumer.freeArray(array);
    +      this.array = consumer.allocateArray(initialSize);
    --- End diff --
    
    Good point, I forgot that, that make sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-209098973
  
    **[Test build #55646 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55646/consoleFull)** for PR 12285 at commit [`d89adf8`](https://github.com/apache/spark/commit/d89adf8e514afb502b8adbac050a32c1672549ff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-208718272
  
    **[Test build #2777 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2777/consoleFull)** for PR 12285 at commit [`c318a35`](https://github.com/apache/spark/commit/c318a358ac712bca81bb38c9745a55dfac08b931).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12285#discussion_r59325768
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java ---
    @@ -69,7 +72,11 @@ public int numRecords() {
         return pos;
       }
     
    -  public void reset() {
    +  public void freeMemoryAndResetPointerArray() {
    +    if (consumer != null) {
    +      consumer.freeArray(array);
    +      this.array = consumer.allocateArray(initialSize);
    --- End diff --
    
    Considering that another concurrent task could grab all the memory freed by `consumer.freeArray(array)`.
    
    Another option could be manually release the array, then call releaseMemory() by passing the different of the two.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12285#discussion_r59459398
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
    @@ -255,6 +253,10 @@ public long spill(long size, MemoryConsumer trigger) throws IOException {
     
         writeSortedFile(false);
         final long spillSize = freeMemory();
    +    inMemSorter.reset();
    +    // Reset the in-memory sorter's pointer array only after freeing up the memory pages holding the
    --- End diff --
    
    OK, I don't have strong opinion on it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-209027227
  
    **[Test build #55626 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55626/consoleFull)** for PR 12285 at commit [`75a44f9`](https://github.com/apache/spark/commit/75a44f9647e59b41f5ee38b225a93c3663bbb27f).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-209136477
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55646/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12285#discussion_r59323906
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java ---
    @@ -69,7 +72,11 @@ public int numRecords() {
         return pos;
       }
     
    -  public void reset() {
    +  public void freeMemoryAndResetPointerArray() {
    +    if (consumer != null) {
    +      consumer.freeArray(array);
    +      this.array = consumer.allocateArray(initialSize);
    --- End diff --
    
    There is very small chance that this could fail, should we grab a lock from TaskMemoryManager for these two?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-208648920
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55561/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-209026557
  
    @davies - Thanks for the review. I have addressed all the comments, please let me know how it looks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-208702898
  
    @davies - Thanks for looking into it. I agree with you that its not a memory leak because that memory may be used later. However, not reducing the pointer array size to the initial size in case of spill is causing heavy memory underutilization because the tasks are not able to get sufficient memory for the storing the records and this situation often lead to the executor OOM.  Also, I don't see any reason why would we want to keep the bloated pointer array if we are spilling all data to disk and not have anything to store in the pointer array. This change is restoring the behavior of the sorter before the PR #9241 in https://github.com/apache/spark/pull/9241/files#diff-3eedc75de4787b842477138d8cc7f150L321. 
    
    The physical plan looks something like this -
    
    ```
    == Physical Plan ==
    SortBasedAggregate(key=[shard_id#7L,id#11L,target#9,target_id#12L], functions=[(hiveudaffunction(HiveFunctionWrapper(UDAFCollectMap@270df931),feature_id#10,feature_value#13,false,0,0),mode=Complete,isDistinct=false)], output=[shard_id#7L,id#11L,target#9,target_id#12L,feature_map#14])
    +- ConvertToSafe
       +- Sort [shard_id#7L ASC,id#11L ASC,target#9 ASC,target_id#12L ASC], false, 0
          +- TungstenExchange hashpartitioning(shard_id#7L,321), None
             +- Project [(id#11L % 321) AS shard_id#7L,id#11L,target#9,target_id#12L,feature_id#10,feature_value#13]
                +- Filter ((((id#11L > 0) && (target_id#12L > 0)) && NOT (id#11L = target_id#12L)) && (cast(feature_value#13 as double) > 0.001))
                   +- HiveTableScan [id#11L,feature_value#13,feature_id#10,target#9,target_id#12L], MetastoreRelation x, y, None, [(ds#8 = 2016-03-20),target#9 IN (1,2,3),feature_id#10 INSET (1, 2)]
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-209027266
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12285#discussion_r59447204
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java ---
    @@ -69,8 +72,15 @@ public int numRecords() {
         return pos;
       }
     
    -  public void reset() {
    +  public long shrinkMemory() {
    --- End diff --
    
    This function is the only thing we need to change, can we still call it `reset` (other do not need to change the caller)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-209086859
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55627/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-209089296
  
    @sitalkedia Sorry for the trouble.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12285#discussion_r59323954
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
    @@ -255,6 +253,10 @@ public long spill(long size, MemoryConsumer trigger) throws IOException {
     
         writeSortedFile(false);
         final long spillSize = freeMemory();
    +    // Reset the in-memory sorter's pointer array only after freeing up the memory pages holding the
    +    // records. Otherwise, if the task is over allocated memory, then without freeing the memory pages,
    +    // we might not be able to get memory for the pointer array.
    +    inMemSorter.freeMemoryAndResetPointerArray();
         taskContext.taskMetrics().incMemoryBytesSpilled(spillSize);
    --- End diff --
    
    We should count the size freed from array


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-207944993
  
    cc @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-208648416
  
    @andrewor14 - Thanks for taking a look. Handled the test case failures.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12285#discussion_r59455753
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
    @@ -255,6 +253,10 @@ public long spill(long size, MemoryConsumer trigger) throws IOException {
     
         writeSortedFile(false);
         final long spillSize = freeMemory();
    +    inMemSorter.reset();
    +    // Reset the in-memory sorter's pointer array only after freeing up the memory pages holding the
    --- End diff --
    
    IMO, comment in `ShuffleExternalSorter` makes it easier to get the context and understand. Also in future, if some one tries to move this call, he will not do so, seeing the comment. If the comment is in the `reset()` function, someone might inadvertently move this call without seeing the comment in `reset()` function.  However, if you have a strong opinion about it, I would gladly move the comment into `reset()`. Let me know what you think. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-209027762
  
    **[Test build #55627 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55627/consoleFull)** for PR 12285 at commit [`b102c25`](https://github.com/apache/spark/commit/b102c257661b7116637cdfa128a66211ce3f52da).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12285#discussion_r59447040
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
    @@ -254,7 +252,11 @@ public long spill(long size, MemoryConsumer trigger) throws IOException {
           spills.size() > 1 ? " times" : " time");
     
         writeSortedFile(false);
    -    final long spillSize = freeMemory();
    +    long spillSize = freeMemory();
    --- End diff --
    
    I'm sorry that I misread this last night, this is spillSize (the number of bytes written into disk), not the amount of freed memory, so we don't need to add the amount from inMemSorter.
    
    Sorry again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12285#discussion_r59323828
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java ---
    @@ -69,7 +72,11 @@ public int numRecords() {
         return pos;
       }
     
    -  public void reset() {
    +  public void freeMemoryAndResetPointerArray() {
    --- End diff --
    
    I think we can still call it `reset`, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12285#discussion_r59453889
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
    @@ -255,6 +253,10 @@ public long spill(long size, MemoryConsumer trigger) throws IOException {
     
         writeSortedFile(false);
         final long spillSize = freeMemory();
    +    inMemSorter.reset();
    --- End diff --
    
    Do we really need to move this call?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-209027268
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55626/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-208648915
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-209023737
  
    **[Test build #55626 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55626/consoleFull)** for PR 12285 at commit [`75a44f9`](https://github.com/apache/spark/commit/75a44f9647e59b41f5ee38b225a93c3663bbb27f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-208638649
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-209145062
  
    Merged into master and 1.6 branch (fixed the conflicts)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-208717200
  
    @davies Thanks for the explanation, your calculation makes sense. You are right that freeing the array can only make a difference of 15% in ideal case. But what we are experiencing is something different. 
    
    Consider the following scenario - We have total shuffle memory of 10G available for 5 tasks. So in ideal situation, each tasks should be assigned 2G of shuffle memory each. And out of those 2G, 300MB should be allocated to the pointer array and rest for storing the records. Now lets say 3 of the tasks finish at the same time and before the driver could run additional tasks on the executor, rest 2 running task aggressively expend their memory and take upto 5G of shuffle memory on the executor, resulting in the pointer array size of around 750MB. Now when the driver runs additional 3 tasks on the executor, previous 2 tasks will be forced to spill, but the pointer array of size 750MB is never freed. This will result in heavy underutilization of memory for the task and in cases where the pointer array actually grew more than the fair share memory of the task, it will result in executor OOM and causing all other tasks on the executor to die.
    
    The job we are running is processing a huge data set of size more than 50TB and we were seeing more than 5% task failure due to OOM. After this fix, we are the failure rate has come down to less than 0.01% and we gain a massive 30% cpu speedup. 
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14363] Fix executor OOM due to memory l...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12285#issuecomment-208763429
  
    **[Test build #2777 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2777/consoleFull)** for PR 12285 at commit [`c318a35`](https://github.com/apache/spark/commit/c318a358ac712bca81bb38c9745a55dfac08b931).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org