You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@uniffle.apache.org by GitBox <gi...@apache.org> on 2022/08/22 07:35:14 UTC
[GitHub] [incubator-uniffle] izchen opened a new pull request, #174: [BUGFIX] Fix resource leak when shuffle read
izchen opened a new pull request, #174:
URL: https://github.com/apache/incubator-uniffle/pull/174
### What changes were proposed in this pull request?
Use `org.apache.spark.TaskContext#addTaskCompletionListener` to clean up resources used by `RssShuffleDataIterator`. This avoids possible resource leaks.
### Why are the changes needed?
Before this PR, `RssShuffleDataIterator` would only clean up used resources after all records read.
When the `Task` fails or cancels, or runs some special logic such as `LocalLimit`, the resource will not be cleaned up. This creates potential resource leaks.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Added a UT case
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org
[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #174: [BUGFIX] Fix resource leak when shuffle read
Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #174:
URL: https://github.com/apache/incubator-uniffle/pull/174#discussion_r951170368
##########
client-spark/spark3/src/main/java/org/apache/spark/shuffle/reader/RssShuffleReader.java:
##########
@@ -201,13 +202,21 @@ public Configuration getHadoopConf() {
RssShuffleDataIterator iterator = new RssShuffleDataIterator<K, C>(
shuffleDependency.serializer(), shuffleReadClient,
readMetrics);
- iterators.add(iterator);
+ CompletionIterator<Product2<K, C>, RssShuffleDataIterator<K, C>> completionIterator =
Review Comment:
> `org.apache.spark.util.CompletionIterator` will clean up as soon as its wrapped `RssShuffleDataIterator.hasNext` returns `false `:
>
> ```
> // org.apache.spark.util.CompletionIterator
> def hasNext: Boolean = {
> val r = iter.hasNext // iter => RssShuffleDataIterator
> if (!r && !completed) {
> completed = true
> // reassign to release resources of highly resource consuming iterators early
> iter = Iterator.empty.asInstanceOf[I]
> completion()
> }
> r
> }
>
> def completion(): Unit // completion() => RssShuffleDataIterator.cleanup
> ```
>
> After this PR, if there is no special case for the `Spark Task`, the timing of resource cleanup is still when the `RssShuffleDataIterator` ends, not when the `Spark Task` ends.
>
> This is the same behavior as before the PR.
OK, I got it. Good catch.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org
[GitHub] [incubator-uniffle] izchen commented on pull request #174: [BUGFIX] Fix resource leak when shuffle read
Posted by GitBox <gi...@apache.org>.
izchen commented on PR #174:
URL: https://github.com/apache/incubator-uniffle/pull/174#issuecomment-1222073866
Thanks for your review, @jerqi !
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org
[GitHub] [incubator-uniffle] codecov-commenter commented on pull request #174: [BUGFIX] Fix resource leak when shuffle read
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #174:
URL: https://github.com/apache/incubator-uniffle/pull/174#issuecomment-1222019681
# [Codecov](https://codecov.io/gh/apache/incubator-uniffle/pull/174?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#174](https://codecov.io/gh/apache/incubator-uniffle/pull/174?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d492aea) into [master](https://codecov.io/gh/apache/incubator-uniffle/commit/74cd40b9fc5f41b8b04fbaac9bab31985e7b6d15?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (74cd40b) will **decrease** coverage by `1.37%`.
> The diff coverage is `n/a`.
```diff
@@ Coverage Diff @@
## master #174 +/- ##
============================================
- Coverage 58.29% 56.92% -1.38%
+ Complexity 1262 1183 -79
============================================
Files 158 149 -9
Lines 8397 7902 -495
Branches 779 749 -30
============================================
- Hits 4895 4498 -397
+ Misses 3251 3162 -89
+ Partials 251 242 -9
```
| [Impacted Files](https://codecov.io/gh/apache/incubator-uniffle/pull/174?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...storage/handler/impl/DataSkippableReadHandler.java](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3RvcmFnZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvdW5pZmZsZS9zdG9yYWdlL2hhbmRsZXIvaW1wbC9EYXRhU2tpcHBhYmxlUmVhZEhhbmRsZXIuamF2YQ==) | `81.25% <0.00%> (-3.13%)` | :arrow_down: |
| [...org/apache/uniffle/server/ShuffleFlushManager.java](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS91bmlmZmxlL3NlcnZlci9TaHVmZmxlRmx1c2hNYW5hZ2VyLmphdmE=) | `77.65% <0.00%> (-1.68%)` | :arrow_down: |
| [...e/spark/shuffle/reader/RssShuffleDataIterator.java](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS9yZWFkZXIvUnNzU2h1ZmZsZURhdGFJdGVyYXRvci5qYXZh) | | |
| [...ava/org/apache/spark/shuffle/RssShuffleHandle.java](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS9Sc3NTaHVmZmxlSGFuZGxlLmphdmE=) | | |
| [...org/apache/spark/shuffle/RssSparkShuffleUtils.java](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS9Sc3NTcGFya1NodWZmbGVVdGlscy5qYXZh) | | |
| [...che/spark/shuffle/writer/BufferManagerOptions.java](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS93cml0ZXIvQnVmZmVyTWFuYWdlck9wdGlvbnMuamF2YQ==) | | |
| [.../java/org/apache/spark/shuffle/RssSparkConfig.java](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS9Sc3NTcGFya0NvbmZpZy5qYXZh) | | |
| [...org/apache/spark/shuffle/writer/AddBlockEvent.java](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS93cml0ZXIvQWRkQmxvY2tFdmVudC5qYXZh) | | |
| [...k/shuffle/writer/WrappedByteArrayOutputStream.java](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS93cml0ZXIvV3JhcHBlZEJ5dGVBcnJheU91dHB1dFN0cmVhbS5qYXZh) | | |
| [...pache/spark/shuffle/writer/WriteBufferManager.java](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS93cml0ZXIvV3JpdGVCdWZmZXJNYW5hZ2VyLmphdmE=) | | |
| ... and [1 more](https://codecov.io/gh/apache/incubator-uniffle/pull/174/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
:mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org
[GitHub] [incubator-uniffle] izchen commented on a diff in pull request #174: [BUGFIX] Fix resource leak when shuffle read
Posted by GitBox <gi...@apache.org>.
izchen commented on code in PR #174:
URL: https://github.com/apache/incubator-uniffle/pull/174#discussion_r951161459
##########
client-spark/spark3/src/main/java/org/apache/spark/shuffle/reader/RssShuffleReader.java:
##########
@@ -201,13 +202,21 @@ public Configuration getHadoopConf() {
RssShuffleDataIterator iterator = new RssShuffleDataIterator<K, C>(
shuffleDependency.serializer(), shuffleReadClient,
readMetrics);
- iterators.add(iterator);
+ CompletionIterator<Product2<K, C>, RssShuffleDataIterator<K, C>> completionIterator =
Review Comment:
`org.apache.spark.util.CompletionIterator` will clean up as soon as its wrapped `RssShuffleDataIterator.hasNext` returns `false `:
```
// org.apache.spark.util.CompletionIterator
def hasNext: Boolean = {
val r = iter.hasNext // iter => RssShuffleDataIterator
if (!r && !completed) {
completed = true
// reassign to release resources of highly resource consuming iterators early
iter = Iterator.empty.asInstanceOf[I]
completion()
}
r
}
def completion(): Unit // completion() => RssShuffleDataIterator.cleanup
```
After this PR, if there is no special case for the `Spark Task`, the timing of resource cleanup is still when the `RssShuffleDataIterator` ends, not when the `Spark Task` ends.
This is the same behavior as before the PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org
[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #174: [BUGFIX] Fix resource leak when shuffle read
Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #174:
URL: https://github.com/apache/incubator-uniffle/pull/174#discussion_r951119866
##########
client-spark/spark3/src/main/java/org/apache/spark/shuffle/reader/RssShuffleReader.java:
##########
@@ -201,13 +202,21 @@ public Configuration getHadoopConf() {
RssShuffleDataIterator iterator = new RssShuffleDataIterator<K, C>(
shuffleDependency.serializer(), shuffleReadClient,
readMetrics);
- iterators.add(iterator);
+ CompletionIterator<Product2<K, C>, RssShuffleDataIterator<K, C>> completionIterator =
Review Comment:
If we use AQE, we could have many iterators, if we don't release the resource of them after we use them, we may occur OOM. I means that we use one iterator, and then release the iterator. We shouldn't release all the iterators at the end of task.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org
[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #174: [BUGFIX] Fix resource leak when shuffle read
Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #174:
URL: https://github.com/apache/incubator-uniffle/pull/174#discussion_r951119866
##########
client-spark/spark3/src/main/java/org/apache/spark/shuffle/reader/RssShuffleReader.java:
##########
@@ -201,13 +202,21 @@ public Configuration getHadoopConf() {
RssShuffleDataIterator iterator = new RssShuffleDataIterator<K, C>(
shuffleDependency.serializer(), shuffleReadClient,
readMetrics);
- iterators.add(iterator);
+ CompletionIterator<Product2<K, C>, RssShuffleDataIterator<K, C>> completionIterator =
Review Comment:
If we use AQE, we could have many iterators, if we don't release the memory of them after we use them, we may occur OOM.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org
[GitHub] [incubator-uniffle] jerqi merged pull request #174: [BUGFIX] Fix resource leak when shuffle read
Posted by GitBox <gi...@apache.org>.
jerqi merged PR #174:
URL: https://github.com/apache/incubator-uniffle/pull/174
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org