You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@uniffle.apache.org by GitBox <gi...@apache.org> on 2022/08/22 07:35:14 UTC

[GitHub] [incubator-uniffle] izchen opened a new pull request, #174: [BUGFIX] Fix resource leak when shuffle read

izchen opened a new pull request, #174:
URL: https://github.com/apache/incubator-uniffle/pull/174

   ### What changes were proposed in this pull request?
   Use `org.apache.spark.TaskContext#addTaskCompletionListener` to clean up resources used by `RssShuffleDataIterator`. This avoids possible resource leaks.
   
   
   ### Why are the changes needed?
   Before this PR, `RssShuffleDataIterator` would only clean up used resources after all records read.
   
   When the `Task` fails or cancels, or runs some special logic such as `LocalLimit`, the resource will not be cleaned up. This creates potential resource leaks.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Added a UT case


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #174: [BUGFIX] Fix resource leak when shuffle read

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #174:
URL: https://github.com/apache/incubator-uniffle/pull/174#discussion_r951170368


##########
client-spark/spark3/src/main/java/org/apache/spark/shuffle/reader/RssShuffleReader.java:
##########
@@ -201,13 +202,21 @@ public Configuration getHadoopConf() {
         RssShuffleDataIterator iterator = new RssShuffleDataIterator<K, C>(
             shuffleDependency.serializer(), shuffleReadClient,
             readMetrics);
-        iterators.add(iterator);
+        CompletionIterator<Product2<K, C>, RssShuffleDataIterator<K, C>> completionIterator =

Review Comment:
   > `org.apache.spark.util.CompletionIterator` will clean up as soon as its wrapped `RssShuffleDataIterator.hasNext` returns `false `:
   > 
   > ```
   > // org.apache.spark.util.CompletionIterator
   >   def hasNext: Boolean = {
   >     val r = iter.hasNext  // iter => RssShuffleDataIterator
   >     if (!r && !completed) {
   >       completed = true
   >       // reassign to release resources of highly resource consuming iterators early
   >       iter = Iterator.empty.asInstanceOf[I]
   >       completion()
   >     }
   >     r
   >   }
   > 
   >   def completion(): Unit  // completion() => RssShuffleDataIterator.cleanup
   > ```
   > 
   > After this PR, if there is no special case for the `Spark Task`, the timing of resource cleanup is still when the `RssShuffleDataIterator` ends, not when the `Spark Task` ends.
   > 
   > This is the same behavior as before the PR.
   
   OK, I got it. Good catch.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] izchen commented on pull request #174: [BUGFIX] Fix resource leak when shuffle read

Posted by GitBox <gi...@apache.org>.
izchen commented on PR #174:
URL: https://github.com/apache/incubator-uniffle/pull/174#issuecomment-1222073866

   Thanks for your review, @jerqi !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] codecov-commenter commented on pull request #174: [BUGFIX] Fix resource leak when shuffle read

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #174:
URL: https://github.com/apache/incubator-uniffle/pull/174#issuecomment-1222019681

   # [Codecov](https://codecov.io/gh/apache/incubator-uniffle/pull/174?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#174](https://codecov.io/gh/apache/incubator-uniffle/pull/174?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d492aea) into [master](https://codecov.io/gh/apache/incubator-uniffle/commit/74cd40b9fc5f41b8b04fbaac9bab31985e7b6d15?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (74cd40b) will **decrease** coverage by `1.37%`.
   > The diff coverage is `n/a`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master     #174      +/-   ##
   ============================================
   - Coverage     58.29%   56.92%   -1.38%     
   + Complexity     1262     1183      -79     
   ============================================
     Files           158      149       -9     
     Lines          8397     7902     -495     
     Branches        779      749      -30     
   ============================================
   - Hits           4895     4498     -397     
   + Misses         3251     3162      -89     
   + Partials        251      242       -9     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-uniffle/pull/174?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...storage/handler/impl/DataSkippableReadHandler.java](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3RvcmFnZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvdW5pZmZsZS9zdG9yYWdlL2hhbmRsZXIvaW1wbC9EYXRhU2tpcHBhYmxlUmVhZEhhbmRsZXIuamF2YQ==) | `81.25% <0.00%> (-3.13%)` | :arrow_down: |
   | [...org/apache/uniffle/server/ShuffleFlushManager.java](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS91bmlmZmxlL3NlcnZlci9TaHVmZmxlRmx1c2hNYW5hZ2VyLmphdmE=) | `77.65% <0.00%> (-1.68%)` | :arrow_down: |
   | [...e/spark/shuffle/reader/RssShuffleDataIterator.java](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS9yZWFkZXIvUnNzU2h1ZmZsZURhdGFJdGVyYXRvci5qYXZh) | | |
   | [...ava/org/apache/spark/shuffle/RssShuffleHandle.java](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS9Sc3NTaHVmZmxlSGFuZGxlLmphdmE=) | | |
   | [...org/apache/spark/shuffle/RssSparkShuffleUtils.java](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS9Sc3NTcGFya1NodWZmbGVVdGlscy5qYXZh) | | |
   | [...che/spark/shuffle/writer/BufferManagerOptions.java](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS93cml0ZXIvQnVmZmVyTWFuYWdlck9wdGlvbnMuamF2YQ==) | | |
   | [.../java/org/apache/spark/shuffle/RssSparkConfig.java](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS9Sc3NTcGFya0NvbmZpZy5qYXZh) | | |
   | [...org/apache/spark/shuffle/writer/AddBlockEvent.java](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS93cml0ZXIvQWRkQmxvY2tFdmVudC5qYXZh) | | |
   | [...k/shuffle/writer/WrappedByteArrayOutputStream.java](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS93cml0ZXIvV3JhcHBlZEJ5dGVBcnJheU91dHB1dFN0cmVhbS5qYXZh) | | |
   | [...pache/spark/shuffle/writer/WriteBufferManager.java](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS93cml0ZXIvV3JpdGVCdWZmZXJNYW5hZ2VyLmphdmE=) | | |
   | ... and [1 more](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] izchen commented on a diff in pull request #174: [BUGFIX] Fix resource leak when shuffle read

Posted by GitBox <gi...@apache.org>.
izchen commented on code in PR #174:
URL: https://github.com/apache/incubator-uniffle/pull/174#discussion_r951161459


##########
client-spark/spark3/src/main/java/org/apache/spark/shuffle/reader/RssShuffleReader.java:
##########
@@ -201,13 +202,21 @@ public Configuration getHadoopConf() {
         RssShuffleDataIterator iterator = new RssShuffleDataIterator<K, C>(
             shuffleDependency.serializer(), shuffleReadClient,
             readMetrics);
-        iterators.add(iterator);
+        CompletionIterator<Product2<K, C>, RssShuffleDataIterator<K, C>> completionIterator =

Review Comment:
   `org.apache.spark.util.CompletionIterator` will clean up as soon as its wrapped `RssShuffleDataIterator.hasNext` returns `false `:
   
   ```
   // org.apache.spark.util.CompletionIterator
     def hasNext: Boolean = {
       val r = iter.hasNext  // iter => RssShuffleDataIterator
       if (!r && !completed) {
         completed = true
         // reassign to release resources of highly resource consuming iterators early
         iter = Iterator.empty.asInstanceOf[I]
         completion()
       }
       r
     }
   
     def completion(): Unit  // completion() => RssShuffleDataIterator.cleanup
   ```
   
   After this PR, if there is no special case for the `Spark Task`, the timing of resource cleanup is still when the `RssShuffleDataIterator` ends, not when the `Spark Task` ends.
   
   This is the same behavior as before the PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #174: [BUGFIX] Fix resource leak when shuffle read

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #174:
URL: https://github.com/apache/incubator-uniffle/pull/174#discussion_r951119866


##########
client-spark/spark3/src/main/java/org/apache/spark/shuffle/reader/RssShuffleReader.java:
##########
@@ -201,13 +202,21 @@ public Configuration getHadoopConf() {
         RssShuffleDataIterator iterator = new RssShuffleDataIterator<K, C>(
             shuffleDependency.serializer(), shuffleReadClient,
             readMetrics);
-        iterators.add(iterator);
+        CompletionIterator<Product2<K, C>, RssShuffleDataIterator<K, C>> completionIterator =

Review Comment:
   If we use AQE, we could have many iterators, if we don't release the resource of them after we use them, we may occur OOM. I means that we use one iterator, and then release the iterator. We shouldn't release all the iterators at the end of task.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #174: [BUGFIX] Fix resource leak when shuffle read

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #174:
URL: https://github.com/apache/incubator-uniffle/pull/174#discussion_r951119866


##########
client-spark/spark3/src/main/java/org/apache/spark/shuffle/reader/RssShuffleReader.java:
##########
@@ -201,13 +202,21 @@ public Configuration getHadoopConf() {
         RssShuffleDataIterator iterator = new RssShuffleDataIterator<K, C>(
             shuffleDependency.serializer(), shuffleReadClient,
             readMetrics);
-        iterators.add(iterator);
+        CompletionIterator<Product2<K, C>, RssShuffleDataIterator<K, C>> completionIterator =

Review Comment:
   If we use AQE, we could have many iterators, if we don't release the memory of them after we use them, we may occur OOM.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi merged pull request #174: [BUGFIX] Fix resource leak when shuffle read

Posted by GitBox <gi...@apache.org>.
jerqi merged PR #174:
URL: https://github.com/apache/incubator-uniffle/pull/174


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org