You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by "lhotari (via GitHub)" <gi...@apache.org> on 2023/10/26 15:27:04 UTC

[PR] [improve][ci] Detect test thread leaks in CI builds and add tooling for resource leak investigation [pulsar]

lhotari opened a new pull request, #21450:
URL: https://github.com/apache/pulsar/pull/21450

   ### Motivation
   
   Tests should clean resources that are created by each test. 
   If this doesn't happen, it will cause a memory leak and also in some rare causes cause CPU resources to be consumed on threads that were created tests that have already been executed. Memory leaks will also eventually consume more CPU when the available heap memory in the Test JVM becomes too small that the GC could operate efficiently.  
   
   The main source of memory leaks in Pulsar tests are caused by thread leaks. Since this problem is hard to detect, there have been a lot of tests that leak threads. There continue to be quite a few remaining issues. 
   
   This PR introduces a reliable way to detect the thread leaks by improving the solution introduced in #10195 . The previous solution reported a lot of false positives since it didn't wait for threads to complete in the thread leak detection. 
   
   In Pulsar CI, there will be warnings in the summary view for classes that leak threads. This will help detect the thread leaks early. It would be possible to also fail the build when a new thread leak is introduced. However, that's something that could be changed later, after the solution introduced in this PR is proven.
   
   In addition to the improvements to thread leak detection in tests, there are tools to investigate ordinary memory leaks in tests. There's an option in the Pulsar CI workflow to collect heap dumps or heap histograms when the test JVM exits. This information will help find out the source of memory leaks when it's not caused by a thread leak. 
   
   The heap dump tooling could also be useful in some tests as a debugging tool. A heap dump contains a snapshot of the heap state and that is very useful in debugging certain type of issues. Heap dumps can be investigated with Eclipse MAT or other tools. Eclipse MAT contains a query language to summarize information. There's also Calcite plugin to Eclipse MAT which makes it possible to use SQL to query and summarize information in the heap dump. Sometimes this is very useful.
   
   ### Modifications
   
   - improve ThreadLeakDetectorListener
     - improve detection algorithm to wait for threads to complete
     - ignore known thread pools that aren't created by the tests
     - add reporting to file
   - add reporting to Pulsar CI based on files that ThreadLeakDetectorListener created
   - add TraceTestResourceCleanupListener which traces test resource cleanup by creating a thread dump, heap histogram and heap dump before the TestNG JVM exits.
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc` <!-- Your PR contains doc changes. -->
   - [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
   - [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->
   - [ ] `doc-complete` <!-- Docs have been already added -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][ci] Detect test thread leaks in CI builds and add tooling for resource leak investigation [pulsar]

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #21450:
URL: https://github.com/apache/pulsar/pull/21450#issuecomment-1783981705

   ## [Codecov](https://app.codecov.io/gh/apache/pulsar/pull/21450?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#21450](https://app.codecov.io/gh/apache/pulsar/pull/21450?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (b121aae) into [master](https://app.codecov.io/gh/apache/pulsar/commit/b010bab6b0d2a304dabdbe6bd825226de0358349?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (b010bab) will **increase** coverage by `36.44%`.
   > Report is 1 commits behind head on master.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://app.codecov.io/gh/apache/pulsar/pull/21450/graphs/tree.svg?width=650&height=150&src=pr&token=acYqCpsK9J&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)](https://app.codecov.io/gh/apache/pulsar/pull/21450?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #21450       +/-   ##
   =============================================
   + Coverage     36.78%   73.22%   +36.44%     
   - Complexity    12235    32583    +20348     
   =============================================
     Files          1713     1890      +177     
     Lines        130758   140357     +9599     
     Branches      14250    15425     +1175     
   =============================================
   + Hits          48095   102774    +54679     
   + Misses        76321    29499    -46822     
   - Partials       6342     8084     +1742     
   ```
   
   | [Flag](https://app.codecov.io/gh/apache/pulsar/pull/21450/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [inttests](https://app.codecov.io/gh/apache/pulsar/pull/21450/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `24.18% <0.00%> (-0.05%)` | :arrow_down: |
   | [systests](https://app.codecov.io/gh/apache/pulsar/pull/21450/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `24.69% <0.00%> (-0.04%)` | :arrow_down: |
   | [unittests](https://app.codecov.io/gh/apache/pulsar/pull/21450/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `72.52% <0.00%> (+40.58%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Files](https://app.codecov.io/gh/apache/pulsar/pull/21450?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [.../org/apache/pulsar/common/util/ThreadDumpUtil.java](https://app.codecov.io/gh/apache/pulsar/pull/21450?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cHVsc2FyLWNvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL2NvbW1vbi91dGlsL1RocmVhZER1bXBVdGlsLmphdmE=) | `59.45% <0.00%> (+59.45%)` | :arrow_up: |
   
   ... and [1453 files with indirect coverage changes](https://app.codecov.io/gh/apache/pulsar/pull/21450/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][ci] Detect test thread leaks in CI builds and add tooling for resource leak investigation [pulsar]

Posted by "lhotari (via GitHub)" <gi...@apache.org>.
lhotari merged PR #21450:
URL: https://github.com/apache/pulsar/pull/21450


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve][ci] Detect test thread leaks in CI builds and add tooling for resource leak investigation [pulsar]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #21450:
URL: https://github.com/apache/pulsar/pull/21450#issuecomment-1783982060

   @lhotari Please add the following content to your PR description and select a checkbox:
   ```
   - [ ] `doc` <!-- Your PR contains doc changes -->
   - [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
   - [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
   - [ ] `doc-complete` <!-- Docs have been already added -->
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org