You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by j-baker <gi...@git.apache.org> on 2017/07/05 14:33:39 UTC

[GitHub] spark pull request #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExter...

GitHub user j-baker opened a pull request:

    https://github.com/apache/spark/pull/18543

    [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowSorter.RowComparator

    ## What changes were proposed in this pull request?
    
    UnsafeExternalRowSorter.RowComparator contains references to the objects
    backing the last arrays sorted. This causes memory leaks, since those
    objects become deallocated but are still live.
    
    We make sure to unset them.
    
    ## How was this patch tested?
    
    Not sure how to explicitly test this. Would appreciate guidance on what kind of testing might be necessary here. Evidence for the bug is provided in the JIRA ticket.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/j-baker/spark jbaker/fix_memory_leak

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18543.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18543
    
----
commit b65c85b8c6b5ab5a17f080a6bfa8f9403f4bebe9
Author: James Baker <jb...@palantir.com>
Date:   2017-07-05T14:25:48Z

    SPARK-21319: Fix memory leak in UnsafeExternalRowSorter.RowComparator
    
    UnsafeExternalRowSorter.RowComparator contains references to the objects
    backing the last arrays sorted. This causes memory leaks, since those
    objects become deallocated but are still live.
    
    We make sure to unset them.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExter...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18543#discussion_r127672102
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java ---
    @@ -211,7 +211,10 @@ public int compare(Object baseObj1, long baseOff1, Object baseObj2, long baseOff
           // TODO: Why are the sizes -1?
           row1.pointTo(baseObj1, baseOff1, -1);
           row2.pointTo(baseObj2, baseOff2, -1);
    -      return ordering.compare(row1, row2);
    +      int comparison = ordering.compare(row1, row2);
    +      row1.pointTo(null, 0L, -1);
    +      row2.pointTo(null, 0L, -1);
    --- End diff --
    
    Good idea @cloud-fan . Looks like `RowComparator` and `KVComparator` could be cleaned up in `UnsafeExternalSorter.cleanupResources` and `UnsafeInMemorySorter.free`. @davies I know this is from a long while back, but does that make sense? Seems like reasonable places to simply 'flush' the references, and won't hurt anything.
    
    I am not 100% sure there's not another path to take care of, or if this frees the ref soon enough to avoid the problem. Thoughts @j-baker ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    Seems reasonable, though I don't know this code well. There is a similar pattern in KVComparator. I wonder if it's even more efficient to have a `clear()` method to reset state of an `UnsafeRow`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExter...

Posted by j-baker <gi...@git.apache.org>.
Github user j-baker commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18543#discussion_r128233945
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java ---
    @@ -211,7 +211,10 @@ public int compare(Object baseObj1, long baseOff1, Object baseObj2, long baseOff
           // TODO: Why are the sizes -1?
           row1.pointTo(baseObj1, baseOff1, -1);
           row2.pointTo(baseObj2, baseOff2, -1);
    -      return ordering.compare(row1, row2);
    +      int comparison = ordering.compare(row1, row2);
    +      row1.pointTo(null, 0L, -1);
    +      row2.pointTo(null, 0L, -1);
    --- End diff --
    
    Then why would we be seeing the OOMs? If at the end of the task the taskcompletionlistener fires and is removed, then the whole comparator becomes unreachable and we have no problem here.
    
    My job looks something like:
    
    dataset.sortWithinPartitions().coalescePartitions() - would not we potentially finish doing some merging before reaching the end of the task?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    **[Test build #79756 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79756/testReport)** for PR 18543 at commit [`05fc618`](https://github.com/apache/spark/commit/05fc618e0c091a21ea3d63aa3ad7a099b8e14c86).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExter...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18543#discussion_r128236888
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java ---
    @@ -211,7 +211,10 @@ public int compare(Object baseObj1, long baseOff1, Object baseObj2, long baseOff
           // TODO: Why are the sizes -1?
           row1.pointTo(baseObj1, baseOff1, -1);
           row2.pointTo(baseObj2, baseOff2, -1);
    -      return ordering.compare(row1, row2);
    +      int comparison = ordering.compare(row1, row2);
    +      row1.pointTo(null, 0L, -1);
    +      row2.pointTo(null, 0L, -1);
    --- End diff --
    
    i see, some partition may finish merging but some do not, and the mergers which are finished are still referred and keep the input rows they compared last time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExter...

Posted by j-baker <gi...@git.apache.org>.
Github user j-baker commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18543#discussion_r128174022
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java ---
    @@ -211,7 +211,10 @@ public int compare(Object baseObj1, long baseOff1, Object baseObj2, long baseOff
           // TODO: Why are the sizes -1?
           row1.pointTo(baseObj1, baseOff1, -1);
           row2.pointTo(baseObj2, baseOff2, -1);
    -      return ordering.compare(row1, row2);
    +      int comparison = ordering.compare(row1, row2);
    +      row1.pointTo(null, 0L, -1);
    +      row2.pointTo(null, 0L, -1);
    --- End diff --
    
    I suppose that if you assume that you can only use these sorters once, then you can probably null out the reference to the comparator once you've constructed the UnsafeSorterSpillMerger, and that would also solve the problem I've been seeing (the callback keeping a strong reference).
    
    It'd still feel weird that you have a comparator that could potentially be responsible for the liveness of hundreds of megabytes of memory, though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExter...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18543#discussion_r127763486
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java ---
    @@ -211,7 +211,10 @@ public int compare(Object baseObj1, long baseOff1, Object baseObj2, long baseOff
           // TODO: Why are the sizes -1?
           row1.pointTo(baseObj1, baseOff1, -1);
           row2.pointTo(baseObj2, baseOff2, -1);
    -      return ordering.compare(row1, row2);
    +      int comparison = ordering.compare(row1, row2);
    +      row1.pointTo(null, 0L, -1);
    +      row2.pointTo(null, 0L, -1);
    --- End diff --
    
    @cloud-fan @srowen It is good idea to do this cleanup once at the end. I am curious how to implement this cleanup.
    
    While @srowen proposed to use `nsafeExternalSorter.cleanupResources` and `UnsafeInMemorySorter.free` that will be called when a task is finished, to do cleanup here does not seem to work in this case. This is because [this issue](https://issues.apache.org/jira/browse/SPARK-21319) occurs before completing a task since `UnsafeExternalSorter` instance is registered into the task `taskContext` at [here](https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java#L159-L161). This cleanup approach will not be performed before an OOM occurs during execution of the task.
    
    IIUC, the end of sort is [here](https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java#L345). This line calls [this sort method](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/Sorter.scala#L36). Either to do the cleanup at the first part or to do the cleanup after checking type of a given comparator at the second part could work.
    
    What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    BTW, since the GC root is `TaskContext`, is it possible to update https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java#L159-L161 to not let `UnsafeExternalSorter` being referred by `TaskContext`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExter...

Posted by j-baker <gi...@git.apache.org>.
Github user j-baker commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18543#discussion_r128186956
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java ---
    @@ -211,7 +211,10 @@ public int compare(Object baseObj1, long baseOff1, Object baseObj2, long baseOff
           // TODO: Why are the sizes -1?
           row1.pointTo(baseObj1, baseOff1, -1);
           row2.pointTo(baseObj2, baseOff2, -1);
    -      return ordering.compare(row1, row2);
    +      int comparison = ordering.compare(row1, row2);
    +      row1.pointTo(null, 0L, -1);
    +      row2.pointTo(null, 0L, -1);
    --- End diff --
    
    ...I'll update this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79757/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by j-baker <gi...@git.apache.org>.
Github user j-baker commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    Can confirm that my job which was exhibiting OOMs no longer OOMs after this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79247/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExter...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18543


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExter...

Posted by j-baker <gi...@git.apache.org>.
Github user j-baker commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18543#discussion_r128027434
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java ---
    @@ -211,7 +211,10 @@ public int compare(Object baseObj1, long baseOff1, Object baseObj2, long baseOff
           // TODO: Why are the sizes -1?
           row1.pointTo(baseObj1, baseOff1, -1);
           row2.pointTo(baseObj2, baseOff2, -1);
    -      return ordering.compare(row1, row2);
    +      int comparison = ordering.compare(row1, row2);
    +      row1.pointTo(null, 0L, -1);
    +      row2.pointTo(null, 0L, -1);
    --- End diff --
    
    So I feel like if the thing ends up in memory, this is correct - but otherwise the comparator is used in the [UnsafeSorterSpillMerger](https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillMerger.java#L51).
    
    Since we're handing back an iterator, am I right in thinking that without some periodic cleanup task you always stand a risk from this kind of leak unless you clear after each comparison or have some kind of async cleanup task?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExter...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18543#discussion_r128221419
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java ---
    @@ -211,7 +211,10 @@ public int compare(Object baseObj1, long baseOff1, Object baseObj2, long baseOff
           // TODO: Why are the sizes -1?
           row1.pointTo(baseObj1, baseOff1, -1);
           row2.pointTo(baseObj2, baseOff2, -1);
    -      return ordering.compare(row1, row2);
    +      int comparison = ordering.compare(row1, row2);
    +      row1.pointTo(null, 0L, -1);
    +      row2.pointTo(null, 0L, -1);
    --- End diff --
    
    The merger reads data from spill files lazily, so when the merging finishes, it's end of the task.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExter...

Posted by j-baker <gi...@git.apache.org>.
Github user j-baker commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18543#discussion_r128186925
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java ---
    @@ -211,7 +211,10 @@ public int compare(Object baseObj1, long baseOff1, Object baseObj2, long baseOff
           // TODO: Why are the sizes -1?
           row1.pointTo(baseObj1, baseOff1, -1);
           row2.pointTo(baseObj2, baseOff2, -1);
    -      return ordering.compare(row1, row2);
    +      int comparison = ordering.compare(row1, row2);
    +      row1.pointTo(null, 0L, -1);
    +      row2.pointTo(null, 0L, -1);
    --- End diff --
    
    maybe it's worth having a clone method on the comparator and making sure we clone the comparator before passing it to anything?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    **[Test build #79756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79756/testReport)** for PR 18543 at commit [`05fc618`](https://github.com/apache/spark/commit/05fc618e0c091a21ea3d63aa3ad7a099b8e14c86).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    **[Test build #79757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79757/testReport)** for PR 18543 at commit [`51ec9a0`](https://github.com/apache/spark/commit/51ec9a026b137d843efd1d3a615bed64d88cb4bb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExter...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18543#discussion_r126600430
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java ---
    @@ -211,7 +211,10 @@ public int compare(Object baseObj1, long baseOff1, Object baseObj2, long baseOff
           // TODO: Why are the sizes -1?
           row1.pointTo(baseObj1, baseOff1, -1);
           row2.pointTo(baseObj2, baseOff2, -1);
    -      return ordering.compare(row1, row2);
    +      int comparison = ordering.compare(row1, row2);
    +      row1.pointTo(null, 0L, -1);
    +      row2.pointTo(null, 0L, -1);
    --- End diff --
    
    can we avoid to do it per comparison? is there any places we can do a cleanup at the end?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    **[Test build #79247 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79247/testReport)** for PR 18543 at commit [`5e7a935`](https://github.com/apache/spark/commit/5e7a935089184b77aa5670078c22f03e3b95022e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExter...

Posted by j-baker <gi...@git.apache.org>.
Github user j-baker commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18543#discussion_r128186755
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java ---
    @@ -211,7 +211,10 @@ public int compare(Object baseObj1, long baseOff1, Object baseObj2, long baseOff
           // TODO: Why are the sizes -1?
           row1.pointTo(baseObj1, baseOff1, -1);
           row2.pointTo(baseObj2, baseOff2, -1);
    -      return ordering.compare(row1, row2);
    +      int comparison = ordering.compare(row1, row2);
    +      row1.pointTo(null, 0L, -1);
    +      row2.pointTo(null, 0L, -1);
    --- End diff --
    
    But after we finish with the merger, the spill readers will disappear but the last buffers being used in the spill readers will remain, no?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    **[Test build #79247 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79247/testReport)** for PR 18543 at commit [`5e7a935`](https://github.com/apache/spark/commit/5e7a935089184b77aa5670078c22f03e3b95022e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExter...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18543#discussion_r128179466
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java ---
    @@ -211,7 +211,10 @@ public int compare(Object baseObj1, long baseOff1, Object baseObj2, long baseOff
           // TODO: Why are the sizes -1?
           row1.pointTo(baseObj1, baseOff1, -1);
           row2.pointTo(baseObj2, baseOff2, -1);
    -      return ordering.compare(row1, row2);
    +      int comparison = ordering.compare(row1, row2);
    +      row1.pointTo(null, 0L, -1);
    +      row2.pointTo(null, 0L, -1);
    --- End diff --
    
    each spill reader in `UnsafeSorterSpillMerger` has a buffer to keep the current record. When the comparator is used in `UnsafeSorterSpillMerger`,  the data it refers is already referred by spill readers, so there is no memory leak problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    **[Test build #79757 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79757/testReport)** for PR 18543 at commit [`51ec9a0`](https://github.com/apache/spark/commit/51ec9a026b137d843efd1d3a615bed64d88cb4bb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    [This program](https://gist.github.com/kiszk/29ce3558564ccb0b8797dec37b29eb12)  works well with this PR while it causes an infinite loop in org.apache.spark.memory.TaskMemoryManager.allocatePage without this PR due to OOM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    @davies or @rxin might be worth a look. The only possible downside here is the extra cycles to clear the reference, so I raise it in case you think we need to find a more optimized way to deal with this. But the change itself looks like a straightforward fix other than that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExter...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18543#discussion_r128239090
  
    --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java ---
    @@ -118,26 +124,28 @@ public int compare(RecordPointerAndKeyPrefix r1, RecordPointerAndKeyPrefix r2) {
       public UnsafeInMemorySorter(
         final MemoryConsumer consumer,
         final TaskMemoryManager memoryManager,
    -    final RecordComparator recordComparator,
    +    final RecordComparator.Factory recordComparatorFactory,
    --- End diff --
    
    Why do we need to change `UnsafeInMemorySorter`. The `TaskContext` refers `UnsafeExternalSorter`, so we only need the comparator factory in `UnsafeExternalSorter`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79756/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    Hi @j-baker , do you have any code snippet to reproduce this memory leak?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExternalRowS...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18543
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org