You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@celeborn.apache.org by "FMX (via GitHub)" <gi...@apache.org> on 2023/02/23 12:51:20 UTC
[GitHub] [incubator-celeborn] FMX opened a new pull request, #1270: [CELEBORN-333][FLINK] Fix memory leak.
FMX opened a new pull request, #1270:
URL: https://github.com/apache/incubator-celeborn/pull/1270
### What changes were proposed in this pull request?
1. Solve buffer dispatcher memory leak in worker.
2. Solve transportFrameDecoder memory leak in client.
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
UT and cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-celeborn] RexXiong commented on a diff in pull request #1270: [CELEBORN-335][FLINK] Fix memory leak.
Posted by "RexXiong (via GitHub)" <gi...@apache.org>.
RexXiong commented on code in PR #1270:
URL: https://github.com/apache/incubator-celeborn/pull/1270#discussion_r1115752825
##########
client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/network/TransportFrameDecoderWithBufferSupplier.java:
##########
@@ -144,6 +148,9 @@ public void channelRead(ChannelHandlerContext ctx, Object data) {
}
}
}
+ } catch (Exception e) {
Review Comment:
Do this need to notify the server?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-celeborn] FMX closed pull request #1270: [CELEBORN-335][FLINK] Fix memory leak.
Posted by "FMX (via GitHub)" <gi...@apache.org>.
FMX closed pull request #1270: [CELEBORN-335][FLINK] Fix memory leak.
URL: https://github.com/apache/incubator-celeborn/pull/1270
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-celeborn] RexXiong commented on a diff in pull request #1270: [CELEBORN-335][FLINK] Fix memory leak.
Posted by "RexXiong (via GitHub)" <gi...@apache.org>.
RexXiong commented on code in PR #1270:
URL: https://github.com/apache/incubator-celeborn/pull/1270#discussion_r1117927255
##########
common/src/main/java/org/apache/celeborn/common/network/server/memory/ReadBufferDispatcher.java:
##########
@@ -48,7 +48,7 @@ public void addBufferRequest(ReadBufferRequest request) {
public void recycle(ByteBuf buf) {
int bufferSize = buf.capacity();
- buf.release();
+ buf.release(buf.refCnt());
memoryManager.changeReadBufferCounter(-1 * bufferSize);
Review Comment:
We can delete buf.retain() when allocators new buffer as we always retain it when we use the buffer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-celeborn] FMX commented on a diff in pull request #1270: [CELEBORN-335][FLINK] Fix memory leak.
Posted by "FMX (via GitHub)" <gi...@apache.org>.
FMX commented on code in PR #1270:
URL: https://github.com/apache/incubator-celeborn/pull/1270#discussion_r1116852226
##########
client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/network/TransportFrameDecoderWithBufferSupplier.java:
##########
@@ -144,6 +148,9 @@ public void channelRead(ChannelHandlerContext ctx, Object data) {
}
}
}
+ } catch (Exception e) {
+ logger.error(
+ "decode msg failed type {} curMsg {} size {}", curType, curMsg, nettyBuf.readableBytes());
Review Comment:
Yes,it should.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-celeborn] FMX commented on a diff in pull request #1270: [CELEBORN-335][FLINK] Fix memory leak.
Posted by "FMX (via GitHub)" <gi...@apache.org>.
FMX commented on code in PR #1270:
URL: https://github.com/apache/incubator-celeborn/pull/1270#discussion_r1116853303
##########
client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/network/TransportFrameDecoderWithBufferSupplier.java:
##########
@@ -144,6 +148,9 @@ public void channelRead(ChannelHandlerContext ctx, Object data) {
}
}
}
+ } catch (Exception e) {
Review Comment:
NO. Because read data won't expect rpc response. And other RPC decode failure means that the data is broken. We might can not send a response to the server.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-celeborn] codecov[bot] commented on pull request #1270: [CELEBORN-335][FLINK] Fix memory leak.
Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #1270:
URL: https://github.com/apache/incubator-celeborn/pull/1270#issuecomment-1441736573
# [Codecov](https://codecov.io/gh/apache/incubator-celeborn/pull/1270?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#1270](https://codecov.io/gh/apache/incubator-celeborn/pull/1270?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e64f639) into [main](https://codecov.io/gh/apache/incubator-celeborn/commit/af9e8366c917c9093b47b2fd8230e366f777af1f?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (af9e836) will **increase** coverage by `0.03%`.
> The diff coverage is `0.00%`.
> :exclamation: Current head e64f639 differs from pull request most recent head a200f7d. Consider uploading reports for the commit a200f7d to get more accurate results
```diff
@@ Coverage Diff @@
## main #1270 +/- ##
============================================
+ Coverage 27.08% 27.11% +0.03%
- Complexity 810 813 +3
============================================
Files 215 215
Lines 18368 18368
Branches 1999 1999
============================================
+ Hits 4973 4978 +5
+ Misses 13067 13065 -2
+ Partials 328 325 -3
```
| [Impacted Files](https://codecov.io/gh/apache/incubator-celeborn/pull/1270?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...on/network/server/memory/ReadBufferDispatcher.java](https://codecov.io/gh/apache/incubator-celeborn/pull/1270?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9jZWxlYm9ybi9jb21tb24vbmV0d29yay9zZXJ2ZXIvbWVtb3J5L1JlYWRCdWZmZXJEaXNwYXRjaGVyLmphdmE=) | `0.00% <0.00%> (ø)` | |
| [...oy/worker/congestcontrol/CongestionController.java](https://codecov.io/gh/apache/incubator-celeborn/pull/1270?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-d29ya2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9jZWxlYm9ybi9zZXJ2aWNlL2RlcGxveS93b3JrZXIvY29uZ2VzdGNvbnRyb2wvQ29uZ2VzdGlvbkNvbnRyb2xsZXIuamF2YQ==) | `76.86% <0.00%> (-0.92%)` | :arrow_down: |
| [...celeborn/service/deploy/master/SlotsAllocator.java](https://codecov.io/gh/apache/incubator-celeborn/pull/1270?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-bWFzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9jZWxlYm9ybi9zZXJ2aWNlL2RlcGxveS9tYXN0ZXIvU2xvdHNBbGxvY2F0b3IuamF2YQ==) | `71.73% <0.00%> (+2.46%)` | :arrow_up: |
:mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-celeborn] zhongqiangczq commented on a diff in pull request #1270: [CELEBORN-335][FLINK] Fix memory leak.
Posted by "zhongqiangczq (via GitHub)" <gi...@apache.org>.
zhongqiangczq commented on code in PR #1270:
URL: https://github.com/apache/incubator-celeborn/pull/1270#discussion_r1116781258
##########
client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/network/TransportFrameDecoderWithBufferSupplier.java:
##########
@@ -144,6 +148,9 @@ public void channelRead(ChannelHandlerContext ctx, Object data) {
}
}
}
+ } catch (Exception e) {
+ logger.error(
+ "decode msg failed type {} curMsg {} size {}", curType, curMsg, nettyBuf.readableBytes());
Review Comment:
logger.error("", nettyBuf.readableBytes() => logger.error("", nettyBuf.readableBytes(),e)?, we can see the exception cause
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org