You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cloudstack.apache.org by GitBox <gi...@apache.org> on 2020/08/27 19:08:43 UTC

[GitHub] [cloudstack] GabrielBrascher opened a new pull request #4291: Manage influxDB Batches avoiding OutOfMemory Exception

GabrielBrascher opened a new pull request #4291:
URL: https://github.com/apache/cloudstack/pull/4291


   ## Description
   <!--- Describe your changes in detail -->
   After a few hours running cloudstack 4.13.1.0 storing stats data on InfluxDB CloudSTack hangs with due to OutOfMemoryException raised due to [com.cloud.server.StatsCollector.writeBatches(StatsCollector.java:1510)](https://github.com/apache/cloudstack/blob/1da76d27f13e045ac88e6c494d604d6133486c9c/server/src/main/java/com/cloud/server/StatsCollector.java#L1510):
   ```
   2020-08-12 21:19:00,972 ERROR [c.c.s.StatsCollector] (StatsCollector-6:ctx-0a4cfe6a) (logid:03a7ba48) Error trying to retrieve host stats
   java.lang.OutOfMemoryError: unable to create new native thread
           ...
           at org.influxdb.impl.BatchProcessor.<init>(BatchProcessor.java:294)
           at org.influxdb.impl.BatchProcessor$Builder.build(BatchProcessor.java:201)
           at org.influxdb.impl.InfluxDBImpl.enableBatch(InfluxDBImpl.java:311)
           at com.cloud.server.StatsCollector.writeBatches(StatsCollector.java:1510)
           at com.cloud.server.StatsCollector$AbstractStatsCollector.sendMetricsToInfluxdb(StatsCollector.java:1351)
           at com.cloud.server.StatsCollector$HostCollector.runInContext(StatsCollector.java:522)
   ```
   
   **Context on InfluxDB Batch:** Enabling batch on InfluxDB is great and speeds writing but it requires caution to avoid Zombie threads.
   
   **Solution:** This happens because the batching feature creates an internal thread pool that needs to be shutdown explicitly; therefore, it is important to add: `influxDB.close()`.
   
   ## Types of changes
   <!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: -->
   - [ ] Breaking change (fix or feature that would cause existing functionality to change)
   - [ ] New feature (non-breaking change which adds functionality)
   - [x] Bug fix (non-breaking change which fixes an issue)
   - [ ] Enhancement (improves an existing feature and functionality)
   - [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
   
   ## How Has This Been Tested?
   <!-- Please describe in detail how you tested your changes. -->
   <!-- Include details of your testing environment, and the tests you ran to -->
   <!-- see how your change affects other areas of the code, etc. -->
   
   - The `OutOfMemoryException` happens in less than 10 hours on the staging environment.
   - After patching the proposed fix on the same test environment with the same workload I spend a week waiting and the exception was not thrown. All looking googd with memory consumption.
   
   <!-- Please read the [CONTRIBUTING](https://github.com/apache/cloudstack/blob/master/CONTRIBUTING.md) document -->
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] rhtyd commented on pull request #4291: Manage influxDB Batches avoiding OutOfMemory Exception

Posted by GitBox <gi...@apache.org>.
rhtyd commented on pull request #4291:
URL: https://github.com/apache/cloudstack/pull/4291#issuecomment-682354644


   @blueorangutan test


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] rhtyd commented on pull request #4291: Manage influxDB Batches avoiding OutOfMemory Exception

Posted by GitBox <gi...@apache.org>.
rhtyd commented on pull request #4291:
URL: https://github.com/apache/cloudstack/pull/4291#issuecomment-684747679


   LGTM, merging based on smokettests and confirmation from @wido 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] blueorangutan commented on pull request #4291: Manage influxDB Batches avoiding OutOfMemory Exception

Posted by GitBox <gi...@apache.org>.
blueorangutan commented on pull request #4291:
URL: https://github.com/apache/cloudstack/pull/4291#issuecomment-682354718


   @rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] blueorangutan commented on pull request #4291: Manage influxDB Batches avoiding OutOfMemory Exception

Posted by GitBox <gi...@apache.org>.
blueorangutan commented on pull request #4291:
URL: https://github.com/apache/cloudstack/pull/4291#issuecomment-682152876


   @GabrielBrascher a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] blueorangutan commented on pull request #4291: Manage influxDB Batches avoiding OutOfMemory Exception

Posted by GitBox <gi...@apache.org>.
blueorangutan commented on pull request #4291:
URL: https://github.com/apache/cloudstack/pull/4291#issuecomment-682162627


   Packaging result: ✔centos7 ✖centos8 ✔debian. JID-1830


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] rhtyd merged pull request #4291: Manage influxDB Batches avoiding OutOfMemory Exception

Posted by GitBox <gi...@apache.org>.
rhtyd merged pull request #4291:
URL: https://github.com/apache/cloudstack/pull/4291


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] GabrielBrascher commented on pull request #4291: Manage influxDB Batches avoiding OutOfMemory Exception

Posted by GitBox <gi...@apache.org>.
GabrielBrascher commented on pull request #4291:
URL: https://github.com/apache/cloudstack/pull/4291#issuecomment-682152508


   @blueorangutan package


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] blueorangutan commented on pull request #4291: Manage influxDB Batches avoiding OutOfMemory Exception

Posted by GitBox <gi...@apache.org>.
blueorangutan commented on pull request #4291:
URL: https://github.com/apache/cloudstack/pull/4291#issuecomment-683031259


   <b>Trillian test result (tid-2585)</b>
   Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
   Total time taken: 40151 seconds
   Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4291-t2585-kvm-centos7.zip
   Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
   Intermittent failure detected: /marvin/tests/smoke/test_vpc_vpn.py
   Smoke tests completed. 76 look OK, 1 have error(s)
   Only failed tests results shown below:
   
   
   Test | Result | Time (s) | Test File
   --- | --- | --- | ---
   test_02_vpc_privategw_static_routes | `Failure` | 343.46 | test_privategw_acl.py
   test_03_vpc_privategw_restart_vpc_cleanup | `Failure` | 240.27 | test_privategw_acl.py
   test_04_rvpc_privategw_static_routes | `Failure` | 331.27 | test_privategw_acl.py
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org