You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by szhem <gi...@git.apache.org> on 2018/11/19 11:19:15 UTC

[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

GitHub user szhem opened a pull request:

    https://github.com/apache/spark/pull/23083

    [SPARK-26114][CORE] ExternalSorter Leak

    ## What changes were proposed in this pull request?
    
    This pull request fixes [SPARK-26114](https://issues.apache.org/jira/browse/SPARK-26114) issue that occurs when trying to reduce the number of partitions by means of coalesce without shuffling after shuffle-based transformations.
    
    For the following data
    ```scala
    import org.apache.hadoop.io._ 
    import org.apache.hadoop.io.compress._ 
    import org.apache.commons.lang._ 
    import org.apache.spark._ 
    
    // generate 100M records of sample data 
    sc.makeRDD(1 to 1000, 1000) 
      .flatMap(item => (1 to 100000) 
        .map(i => new Text(RandomStringUtils.randomAlphanumeric(3).toLowerCase) -> new Text(RandomStringUtils.randomAlphanumeric(1024)))) 
      .saveAsSequenceFile("/tmp/random-strings", Some(classOf[GzipCodec])) 
    ```
    
    and the following job
    ```scala
    import org.apache.hadoop.io._
    import org.apache.spark._
    import org.apache.spark.storage._
    
    val rdd = sc.sequenceFile("/tmp/random-strings", classOf[Text], classOf[Text])
    rdd 
      .map(item => item._1.toString -> item._2.toString) 
      .repartitionAndSortWithinPartitions(new HashPartitioner(1000)) 
      .coalesce(10,false) 
      .count 
    ```
    
    ... executed like the following
    ```bash
    spark-shell \ 
      --num-executors=5 \ 
      --executor-cores=2 \ 
      --master=yarn \
      --deploy-mode=client \ 
      --conf spark.executor.memory=1g \ 
      --conf spark.dynamicAllocation.enabled=false \
      --conf spark.executor.extraJavaOptions='-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Dio.netty.noUnsafe=true'
    ```
    
    ... executors are always failing with OutOfMemoryErrors.
    
    The main issue is multiple leaks of ExternalSorter references.
    For example, in case of 2 tasks per executor it is expected to be 2 simultaneous instances of ExternalSorter per executor but heap dump generated on OutOfMemoryError shows that there are more ones.
    
    ![run1-noparams-dominator-tree-externalsorter](https://user-images.githubusercontent.com/1523889/48703665-782ce580-ec05-11e8-95a9-d6c94e8285ab.png)
    
    
    P.S. This PR does not cover cases with CoGroupedRDDs which use ExternalAppendOnlyMap internally, which itself can lead to OutOfMemoryErrors in many places. 
    
    ## How was this patch tested?
    
    - Existing unit tests
    - New unit tests
    - Job executions on the live environment
    
    Here is the screenshot before applying this patch
    ![run3-noparams-failure-ui-5x2-repartition-and-sort](https://user-images.githubusercontent.com/1523889/48700395-f769eb80-ebfc-11e8-831b-e94c757d416c.png)
    
    Here is the screenshot after applying this patch
    ![run3-noparams-success-ui-5x2-repartition-and-sort](https://user-images.githubusercontent.com/1523889/48700610-7a8b4180-ebfd-11e8-9761-baaf38a58e66.png)
    And in case of reducing the number of executors even more the job is still stable
    ![run3-noparams-success-ui-2x2-repartition-and-sort](https://user-images.githubusercontent.com/1523889/48700619-82e37c80-ebfd-11e8-98ed-a38e1f1f1fd9.png)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/szhem/spark SPARK-26114-externalsorter-leak

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23083.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23083
    
----
commit 9ca5ddbeb2022259ec79e60b3e81332466947d90
Author: Sergey Zhemzhitsky <sz...@...>
Date:   2018-11-05T22:31:59Z

    Allow memory consumers and spillables to be optionally unregistered and not trackable on freeing up the memory

commit bf57de2c9d23c3cbac23817a3d0b6158739ae8aa
Author: Sergey Zhemzhitsky <sz...@...>
Date:   2018-11-05T22:34:15Z

    allow to remove already registered task completion listeners if they dont need to be fired at task completion

commit e3531acf2751d4ba63e7fa353b66c70d534271df
Author: Sergey Zhemzhitsky <sz...@...>
Date:   2018-11-05T22:35:31Z

    clean up readingIterator on stop

commit baa9656e9d9fa2d006ed6fb98257a213bcace588
Author: Sergey Zhemzhitsky <sz...@...>
Date:   2018-11-05T22:37:07Z

    prevent capturing and holding sub-iterators after completion

commit 3cc54522545f43e0ee9ff9443ba3ea67e5fb9d5b
Author: Sergey Zhemzhitsky <sz...@...>
Date:   2018-11-05T22:40:13Z

    cleaning up resourses as soon as they no longer needed, dont waiting till the end of the task

commit d36323e7db3d8dc5ec01a2ae46752342ff01fac5
Author: Sergey Zhemzhitsky <sz...@...>
Date:   2018-11-18T01:01:15Z

    adding some unit tests

commit 12075ec265f0d09cd52865bb91155898b9ede523
Author: Sergey Zhemzhitsky <sz...@...>
Date:   2018-11-18T07:07:27Z

    improved docs

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by szhem <gi...@git.apache.org>.
Github user szhem commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23083#discussion_r235757779
  
    --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala ---
    @@ -99,6 +99,13 @@ private[spark] class TaskContextImpl(
         this
       }
     
    +  override def remoteTaskCompletionListener(listener: TaskCompletionListener)
    +      : this.type = synchronized {
    +    onCompleteCallbacks -= listener
    --- End diff --
    
    Replaced ArrayBuffer with LinkedHashSet. Thank you!
    
    Another interesting question is why these collections are traversed in the reverse order in [invokeLisneters](https://github.com/apache/spark/pull/23083/files#diff-df60223d4208aff4c67335528a55154cR135) like the following
    ```scala
      private def invokeListeners[T](
          listeners: Seq[T],
          name: String,
          error: Option[Throwable])(
          callback: T => Unit): Unit = {
        val errorMsgs = new ArrayBuffer[String](2)
        // Process callbacks in the reverse order of registration
        listeners.reverse.foreach { listener =>
          ...
        }
    }
    ```
    
     I believe @hvanhovell could help to understand. @hvanhovell Could you please remind why task completion and error listeners are traversed in the reverse order (you seem to the the one who added the corresponding line)?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by szhem <gi...@git.apache.org>.
Github user szhem commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23083#discussion_r236057101
  
    --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala ---
    @@ -727,9 +727,10 @@ private[spark] class ExternalSorter[K, V, C](
         spills.clear()
         forceSpillFiles.foreach(s => s.file.delete())
         forceSpillFiles.clear()
    -    if (map != null || buffer != null) {
    +    if (map != null || buffer != null || readingIterator != null) {
           map = null // So that the memory can be garbage-collected
           buffer = null // So that the memory can be garbage-collected
    +      readingIterator = null // So that the memory can be garbage-collected
    --- End diff --
    
    @advancedxy I've tried to remove all the modifications except for this one and got OutOfMemoryErrors once again. Here are the details:
    
    1. Now there are 4 `ExternalSorter` remained 
    2 of them are not closed ones ...
    ![1_readingiterator_isnull_nonclosed_externalsorter](https://user-images.githubusercontent.com/1523889/48973288-2218d180-f04d-11e8-9329-27b3edf33c48.png)
    and 2 of them are closed ones ...
    ![2_readingiterator_isnull_closed_externalsorter](https://user-images.githubusercontent.com/1523889/48973295-483e7180-f04d-11e8-83cf-23361515363f.png)
    as expected
    2. There are 2 `SpillableIterator`s (which consume a significant part of memory) of already closed `ExternalSorter`s remained
    ![4_readingiterator_isnull_spillableiterator_of_closed_externalsorter](https://user-images.githubusercontent.com/1523889/48973318-cf8be500-f04d-11e8-912f-74be7420ca95.png)
    3. These `SpillableIterator`s are referenced by `CompletionIterator`s ...
    ![6_completioniterator_of_blockstoreshufflereader](https://user-images.githubusercontent.com/1523889/48973357-a6b81f80-f04e-11e8-810f-dc8941430f34.png)
    ... which in their order seem to be referenced by the `cur` field ...
    ![7_coalescedrdd_compute_flatmap](https://user-images.githubusercontent.com/1523889/48973491-7e7df000-f051-11e8-8864-7e9e7f3f994b.png)
    ... of the standard `Iterator`'s `flatMap` that is used in the `compute` method of `CoalescedRDD`
    ![image](https://user-images.githubusercontent.com/1523889/48973401-7fae1d80-f04f-11e8-8cf2-043c808173d9.png)
    
    Standard `Iterator`'s `flatMap` does not clean up its `cur` field before obtaining the next value for it which in its order will consume quite a lot of memory too 
    ![image](https://user-images.githubusercontent.com/1523889/48973418-dfa4c400-f04f-11e8-8f0e-b464567d43de.png)
    .. and in case of Spark that means that the previous iterator consuming the memory will live there while fetching the next value for it
    ![8_coalescedrdd_compute_flatmap_cur_isnotassigned](https://user-images.githubusercontent.com/1523889/48974089-0dddd000-f05f-11e8-8319-f7d1f778f381.png)
    
    So I've returned the changes made to the `CompletionIterator` to reassign the reference of its sub-iterator to the `empty` iterator ...
    ![image](https://user-images.githubusercontent.com/1523889/48973472-27781b00-f051-11e8-86e1-cd6b87cd114b.png)
    
    ... and that has helped. 
    
    P.S. I believe that cleaning up the standard `flatMap`'s iterator `cur` field before calling `nextCur` could help too
    ```scala
      def flatMap[B](f: A => GenTraversableOnce[B]): Iterator[B] = new AbstractIterator[B] {
        private var cur: Iterator[B] = empty
        private def nextCur() { cur = f(self.next()).toIterator }
        def hasNext: Boolean = {
          // Equivalent to cur.hasNext || self.hasNext && { nextCur(); hasNext }
          // but slightly shorter bytecode (better JVM inlining!)
          while (!cur.hasNext) {
            cur = empty
            if (!self.hasNext) return false
            nextCur()
          }
          true
        }
        def next(): B = (if (hasNext) cur else empty).next()
      }
    ```
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by advancedxy <gi...@git.apache.org>.
Github user advancedxy commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    > For the task completion listener, I think it's an overkill to introduce a new API, do you know where exactly we leak the memory? and can we null it out when the ShuffleBlockFetcherIterator reaches to its end?
    
    If I understand correctly, the memory is leaked because external sorter is referenced in `TaskCompletionListener` and it's only gced when the task is completed. However for `coalesce` or similar APIs, multiple `BlockStoreShuffleReader`s are created as there are multiple input sources, the internal sorter is not released until all shuffle readers are consumed and task is finished.
    
    It's an overkill to introduce a new API. However, I think we can limited it into private[Spark] scope. 
    Like @szhem, I don't figure out another way to null out the sorter reference yet.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by advancedxy <gi...@git.apache.org>.
Github user advancedxy commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    And another thing:
    > P.S. This PR does not cover cases with CoGroupedRDDs which use ExternalAppendOnlyMap internally, which itself can lead to OutOfMemoryErrors in many places.
    So do you mean CoGroupRDDs with multiple input sources with have similar problems? If so, can you create another Jira?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Looking at the code, we are trying to fix 2 memory leaks: the task completion listener in `ShuffleBlockFetcherIterator`, and the `CompletionIterator`. If that's case, can you say that in the PR description?
    
    For the task completion listener, I think it's an overkill to introduce a new API, do we know exactly where we leak the memory? and can we null it out when the `ShuffleBlockFetcherIterator` reaches to its end?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99309/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5437/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    **[Test build #4433 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4433/testReport)** for PR 23083 at commit [`12075ec`](https://github.com/apache/spark/commit/12075ec265f0d09cd52865bb91155898b9ede523).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    **[Test build #99309 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99309/testReport)** for PR 23083 at commit [`1723819`](https://github.com/apache/spark/commit/17238196719de1e68cbcb1eb930cb3176308e437).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter's readingItera...

Posted by advancedxy <gi...@git.apache.org>.
Github user advancedxy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23083#discussion_r236060023
  
    --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala ---
    @@ -727,9 +727,10 @@ private[spark] class ExternalSorter[K, V, C](
         spills.clear()
         forceSpillFiles.foreach(s => s.file.delete())
         forceSpillFiles.clear()
    -    if (map != null || buffer != null) {
    +    if (map != null || buffer != null || readingIterator != null) {
           map = null // So that the memory can be garbage-collected
           buffer = null // So that the memory can be garbage-collected
    +      readingIterator = null // So that the memory can be garbage-collected
    --- End diff --
    
    Nice. Case well explained.
    
    But I think you need to add corresponding test cases for `CompletionIterator` and `ExternalSorter`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by szhem <gi...@git.apache.org>.
Github user szhem commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Hi @davies, @advancedxy, @rxin,
    You seem to be the last ones who touched the corresponding parts of the files in this PR.
    Could you be so kind to take a look at it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by advancedxy <gi...@git.apache.org>.
Github user advancedxy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23083#discussion_r235357328
  
    --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala ---
    @@ -99,6 +99,13 @@ private[spark] class TaskContextImpl(
         this
       }
     
    +  override def remoteTaskCompletionListener(listener: TaskCompletionListener)
    +      : this.type = synchronized {
    +    onCompleteCallbacks -= listener
    --- End diff --
    
    If we are going to add the new interface, I think so.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5406/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99313/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by advancedxy <gi...@git.apache.org>.
Github user advancedxy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23083#discussion_r235297396
  
    --- Diff: core/src/main/scala/org/apache/spark/TaskContext.scala ---
    @@ -127,9 +127,21 @@ abstract class TaskContext extends Serializable {
         // Note that due to this scala bug: https://github.com/scala/bug/issues/11016, we need to make
         // this function polymorphic for every scala version >= 2.12, otherwise an overloaded method
         // resolution error occurs at compile time.
    -    addTaskCompletionListener(new TaskCompletionListener {
    -      override def onTaskCompletion(context: TaskContext): Unit = f(context)
    -    })
    +    addTaskCompletionListener(TaskCompletionListenerWrapper(f))
    +  }
    +
    +  /**
    +   * Removes a (Java friendly) listener that is no longer needed to be executed on task completion.
    +   */
    +  def remoteTaskCompletionListener(listener: TaskCompletionListener): TaskContext
    --- End diff --
    
    you mean `removeTaskCompletionListener`?  didn't you?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    **[Test build #99351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99351/testReport)** for PR 23083 at commit [`1723819`](https://github.com/apache/spark/commit/17238196719de1e68cbcb1eb930cb3176308e437).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99360/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    **[Test build #99326 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99326/testReport)** for PR 23083 at commit [`1723819`](https://github.com/apache/spark/commit/17238196719de1e68cbcb1eb930cb3176308e437).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by szhem <gi...@git.apache.org>.
Github user szhem commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23083#discussion_r235314949
  
    --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala ---
    @@ -99,6 +99,13 @@ private[spark] class TaskContextImpl(
         this
       }
     
    +  override def remoteTaskCompletionListener(listener: TaskCompletionListener)
    +      : this.type = synchronized {
    +    onCompleteCallbacks -= listener
    --- End diff --
    
    Should we do the same thing (i.e. chaning ArrayBuffer to LinkedHashSet) for onFailureCallbacks too?
    ```scala
      /** List of callback functions to execute when the task completes. */
      @transient private val onCompleteCallbacks = new ArrayBuffer[TaskCompletionListener]
    
      /** List of callback functions to execute when the task fails. */
      @transient private val onFailureCallbacks = new ArrayBuffer[TaskFailureListener]
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    **[Test build #4433 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4433/testReport)** for PR 23083 at commit [`12075ec`](https://github.com/apache/spark/commit/12075ec265f0d09cd52865bb91155898b9ede523).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by szhem <gi...@git.apache.org>.
Github user szhem commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23083#discussion_r235308451
  
    --- Diff: core/src/main/scala/org/apache/spark/TaskContext.scala ---
    @@ -127,9 +127,21 @@ abstract class TaskContext extends Serializable {
         // Note that due to this scala bug: https://github.com/scala/bug/issues/11016, we need to make
         // this function polymorphic for every scala version >= 2.12, otherwise an overloaded method
         // resolution error occurs at compile time.
    -    addTaskCompletionListener(new TaskCompletionListener {
    -      override def onTaskCompletion(context: TaskContext): Unit = f(context)
    -    })
    +    addTaskCompletionListener(TaskCompletionListenerWrapper(f))
    +  }
    +
    +  /**
    +   * Removes a (Java friendly) listener that is no longer needed to be executed on task completion.
    +   */
    +  def remoteTaskCompletionListener(listener: TaskCompletionListener): TaskContext
    --- End diff --
    
    Yep, seems that `v` was replaced with `t` on my keyboard)
    Thanks a lot!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by szhem <gi...@git.apache.org>.
Github user szhem commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23083#discussion_r235769204
  
    --- Diff: core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala ---
    @@ -103,11 +116,26 @@ private[spark] class BlockStoreShuffleReader[K, C](
             context.taskMetrics().incMemoryBytesSpilled(sorter.memoryBytesSpilled)
             context.taskMetrics().incDiskBytesSpilled(sorter.diskBytesSpilled)
             context.taskMetrics().incPeakExecutionMemory(sorter.peakMemoryUsedBytes)
    +        val taskListener = new TaskCompletionListener {
    +          override def onTaskCompletion(context: TaskContext): Unit = sorter.stop()
    +        }
             // Use completion callback to stop sorter if task was finished/cancelled.
    -        context.addTaskCompletionListener[Unit](_ => {
    -          sorter.stop()
    -        })
    -        CompletionIterator[Product2[K, C], Iterator[Product2[K, C]]](sorter.iterator, sorter.stop())
    +        context.addTaskCompletionListener(taskListener)
    +        CompletionIterator[Product2[K, C], Iterator[Product2[K, C]]](
    +          sorter.iterator,
    +          {
    +            sorter.stop()
    +            // remove task completion listener as soon as the sorter stops to prevent holding
    +            // its references till the end of the task which may lead to memory leaks, for
    +            // example, in case of processing multiple ShuffledRDDPartitions by a single task
    +            // like in case of CoalescedRDD occurred after the ShuffledRDD in the same stage
    +            // (e.g. rdd.repartition(1000).coalesce(10));
    +            // note that holding sorter references till the end of the task also holds
    +            // references to PartitionedAppendOnlyMap and PartitionedPairBuffer too and these
    +            // ones may consume a significant part of the available memory
    +            context.remoteTaskCompletionListener(taskListener)
    --- End diff --
    
    Great question! Honestly speaking I don't have pretty good solution right now.
    TaskCompletionListener stops sorter in case of task failures, cancels, etc., i.e. in case of abnormal termination. In "happy path" case task completion listener is not needed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    **[Test build #99254 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99254/testReport)** for PR 23083 at commit [`1723819`](https://github.com/apache/spark/commit/17238196719de1e68cbcb1eb930cb3176308e437).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    **[Test build #99351 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99351/testReport)** for PR 23083 at commit [`1723819`](https://github.com/apache/spark/commit/17238196719de1e68cbcb1eb930cb3176308e437).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99267/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    **[Test build #99326 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99326/testReport)** for PR 23083 at commit [`1723819`](https://github.com/apache/spark/commit/17238196719de1e68cbcb1eb930cb3176308e437).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    **[Test build #99309 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99309/testReport)** for PR 23083 at commit [`1723819`](https://github.com/apache/spark/commit/17238196719de1e68cbcb1eb930cb3176308e437).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99254/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by advancedxy <gi...@git.apache.org>.
Github user advancedxy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23083#discussion_r235300436
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -72,7 +73,8 @@ final class ShuffleBlockFetcherIterator(
         maxBlocksInFlightPerAddress: Int,
         maxReqSizeShuffleToMem: Long,
         detectCorrupt: Boolean)
    -  extends Iterator[(BlockId, InputStream)] with DownloadFileManager with Logging {
    +  extends Iterator[(BlockId, InputStream)] with DownloadFileManager with TaskCompletionListener
    --- End diff --
    
    I don't it's a good idea for ShuffleBlockFetchIterator to be a subclass of TaskCompletionListener. 
    What's wrong with original solution?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by advancedxy <gi...@git.apache.org>.
Github user advancedxy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23083#discussion_r235299353
  
    --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala ---
    @@ -99,6 +99,13 @@ private[spark] class TaskContextImpl(
         this
       }
     
    +  override def remoteTaskCompletionListener(listener: TaskCompletionListener)
    +      : this.type = synchronized {
    +    onCompleteCallbacks -= listener
    --- End diff --
    
    I'm not use whether we should add `removeTaskCompletionListener` or not.
    
    If we are going to add this method. Then this's an O(n) operation.  Maybe we need to replace onCompletedCallbacks to a LinkedHashSet?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by szhem <gi...@git.apache.org>.
Github user szhem commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    > So do you mean CoGroupRDDs with multiple input sources will have similar problems?
    
    Yep, but a little bit different ones
    
    > If so, can you create another Jira?
    
    Will do it shortly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    **[Test build #99267 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99267/testReport)** for PR 23083 at commit [`1723819`](https://github.com/apache/spark/commit/17238196719de1e68cbcb1eb930cb3176308e437).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by advancedxy <gi...@git.apache.org>.
Github user advancedxy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23083#discussion_r235299656
  
    --- Diff: core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala ---
    @@ -103,11 +116,26 @@ private[spark] class BlockStoreShuffleReader[K, C](
             context.taskMetrics().incMemoryBytesSpilled(sorter.memoryBytesSpilled)
             context.taskMetrics().incDiskBytesSpilled(sorter.diskBytesSpilled)
             context.taskMetrics().incPeakExecutionMemory(sorter.peakMemoryUsedBytes)
    +        val taskListener = new TaskCompletionListener {
    +          override def onTaskCompletion(context: TaskContext): Unit = sorter.stop()
    +        }
             // Use completion callback to stop sorter if task was finished/cancelled.
    -        context.addTaskCompletionListener[Unit](_ => {
    -          sorter.stop()
    -        })
    -        CompletionIterator[Product2[K, C], Iterator[Product2[K, C]]](sorter.iterator, sorter.stop())
    +        context.addTaskCompletionListener(taskListener)
    +        CompletionIterator[Product2[K, C], Iterator[Product2[K, C]]](
    +          sorter.iterator,
    +          {
    +            sorter.stop()
    +            // remove task completion listener as soon as the sorter stops to prevent holding
    +            // its references till the end of the task which may lead to memory leaks, for
    +            // example, in case of processing multiple ShuffledRDDPartitions by a single task
    +            // like in case of CoalescedRDD occurred after the ShuffledRDD in the same stage
    +            // (e.g. rdd.repartition(1000).coalesce(10));
    +            // note that holding sorter references till the end of the task also holds
    +            // references to PartitionedAppendOnlyMap and PartitionedPairBuffer too and these
    +            // ones may consume a significant part of the available memory
    +            context.remoteTaskCompletionListener(taskListener)
    --- End diff --
    
    Nice catch.
    
    Liked I said in the above, do we have another way to remove reference to sorter?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by advancedxy <gi...@git.apache.org>.
Github user advancedxy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23083#discussion_r235955118
  
    --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala ---
    @@ -727,9 +727,10 @@ private[spark] class ExternalSorter[K, V, C](
         spills.clear()
         forceSpillFiles.foreach(s => s.file.delete())
         forceSpillFiles.clear()
    -    if (map != null || buffer != null) {
    +    if (map != null || buffer != null || readingIterator != null) {
           map = null // So that the memory can be garbage-collected
           buffer = null // So that the memory can be garbage-collected
    +      readingIterator = null // So that the memory can be garbage-collected
    --- End diff --
    
    Hi @szhem , I discussed with wenchen offline. I think this is the key point. After nulling out `readingIterator`, `ExternalSorter` should released all the memories it occupied.
    
    Yes, `ExternalSorter` is leaked in `TaskCompletionListener`,  but it would already be stopped in `CompletionIterator` in happy path. The stopped sorter wouldn't occupy too much memory. The `readingIterator` 
    is occupying memory because it may reference `map/buffer.partitionedDestructiveSortedIterator`, which itself references `map/buffer`. So only nulling out map or buffer is not enough.
    
    Can you try with this modification only and see whether OOM still occurs.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by szhem <gi...@git.apache.org>.
Github user szhem commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Hi @cloud-fan 
    
    > Looking at the code, we are trying to fix 2 memory leaks: the task completion listener in ShuffleBlockFetcherIterator, and the CompletionIterator. If that's case, can you say that in the PR description?
    
    I've updated the description and the title of this PR correspondingly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    thanks, merging to master/2.4!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter's readingItera...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/23083


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by szhem <gi...@git.apache.org>.
Github user szhem commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23083#discussion_r235319165
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -72,7 +73,8 @@ final class ShuffleBlockFetcherIterator(
         maxBlocksInFlightPerAddress: Int,
         maxReqSizeShuffleToMem: Long,
         detectCorrupt: Boolean)
    -  extends Iterator[(BlockId, InputStream)] with DownloadFileManager with Logging {
    +  extends Iterator[(BlockId, InputStream)] with DownloadFileManager with TaskCompletionListener
    --- End diff --
    
    The main reason is that TaskCompletionListener is added in one place (in `initialize` method) and needs to be removed in another one (in `cleanup` method).
    ![image](https://user-images.githubusercontent.com/1523889/48833554-abe64780-ed8c-11e8-9da0-ef826918a275.png)
    Will introduce a field for TaskCompletionListener instead. Thank you!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter's readingItera...

Posted by szhem <gi...@git.apache.org>.
Github user szhem commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23083#discussion_r236083618
  
    --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala ---
    @@ -727,9 +727,10 @@ private[spark] class ExternalSorter[K, V, C](
         spills.clear()
         forceSpillFiles.foreach(s => s.file.delete())
         forceSpillFiles.clear()
    -    if (map != null || buffer != null) {
    +    if (map != null || buffer != null || readingIterator != null) {
           map = null // So that the memory can be garbage-collected
           buffer = null // So that the memory can be garbage-collected
    +      readingIterator = null // So that the memory can be garbage-collected
    --- End diff --
    
    I've added [test case for CompletionIterator](https://github.com/apache/spark/pull/23083/files#diff-444ed6b5e5333c3359cecec7d082396dR50).
    
    Regarding `ExternalSorter` - taking into account that only the private api has been changed and there no similar test cases which verify that private `map` and `buffer` fields are set to `null` after sorter stops, don't you think that already existing tests cover the situation with `readingIterator` too?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by szhem <gi...@git.apache.org>.
Github user szhem commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23083#discussion_r235759169
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -72,7 +73,8 @@ final class ShuffleBlockFetcherIterator(
         maxBlocksInFlightPerAddress: Int,
         maxReqSizeShuffleToMem: Long,
         detectCorrupt: Boolean)
    -  extends Iterator[(BlockId, InputStream)] with DownloadFileManager with Logging {
    +  extends Iterator[(BlockId, InputStream)] with DownloadFileManager with TaskCompletionListener
    --- End diff --
    
    Introduced [the corresponding field](https://github.com/apache/spark/pull/23083/files#diff-27109eb30a77542d377c936e0d134420R156).
    ```scala
      /**
       * Task completion callback to be called in both success as well as failure cases to cleanup.
       * It may not be called at all in case the `cleanup` method has already been called before
       * task completion.
       */
      private[this] val cleanupTaskCompletionListener = (_: TaskContext) => cleanup()
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by advancedxy <gi...@git.apache.org>.
Github user advancedxy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23083#discussion_r235357628
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -72,7 +73,8 @@ final class ShuffleBlockFetcherIterator(
         maxBlocksInFlightPerAddress: Int,
         maxReqSizeShuffleToMem: Long,
         detectCorrupt: Boolean)
    -  extends Iterator[(BlockId, InputStream)] with DownloadFileManager with Logging {
    +  extends Iterator[(BlockId, InputStream)] with DownloadFileManager with TaskCompletionListener
    --- End diff --
    
    Another field sounds reasonable.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5341/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    **[Test build #99313 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99313/testReport)** for PR 23083 at commit [`1723819`](https://github.com/apache/spark/commit/17238196719de1e68cbcb1eb930cb3176308e437).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    **[Test build #99254 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99254/testReport)** for PR 23083 at commit [`1723819`](https://github.com/apache/spark/commit/17238196719de1e68cbcb1eb930cb3176308e437).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    **[Test build #99360 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99360/testReport)** for PR 23083 at commit [`1723819`](https://github.com/apache/spark/commit/17238196719de1e68cbcb1eb930cb3176308e437).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5396/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    LGTM, thanks for your great work!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99326/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    **[Test build #99267 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99267/testReport)** for PR 23083 at commit [`1723819`](https://github.com/apache/spark/commit/17238196719de1e68cbcb1eb930cb3176308e437).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5428/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99351/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5392/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5351/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    **[Test build #99313 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99313/testReport)** for PR 23083 at commit [`1723819`](https://github.com/apache/spark/commit/17238196719de1e68cbcb1eb930cb3176308e437).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    **[Test build #99360 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99360/testReport)** for PR 23083 at commit [`1723819`](https://github.com/apache/spark/commit/17238196719de1e68cbcb1eb930cb3176308e437).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org