You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cloudstack.apache.org by "phsm (via GitHub)" <gi...@apache.org> on 2023/06/05 14:29:23 UTC

[GitHub] [cloudstack] phsm opened a new issue, #7591: Changing CPU overprovisioning factor breaks prometheus and listHosts usage metrics

phsm opened a new issue, #7591:
URL: https://github.com/apache/cloudstack/issues/7591

   <!--
   Verify first that your issue/request is not already reported on GitHub.
   Also test if the latest release and main branch are affected too.
   Always add information AFTER of these HTML comments, but no need to delete the comments.
   -->
   
   ##### ISSUE TYPE
   <!-- Pick one below and delete the rest -->
    * Bug Report
   
   ##### COMPONENT NAME
   <!--
   Categorize the issue, e.g. API, VR, VPN, UI, etc.
   -->
   ~~~
   Prometheus Exporter
   API
   ~~~
   
   ##### CLOUDSTACK VERSION
   <!--
   New line separated list of affected versions, commit ID for issues on main branch.
   -->
   
   ~~~
   4.17.0
   4.18.0
   ~~~
   
   ##### CONFIGURATION
   <!--
   Information about the configuration if relevant, e.g. basic network, advanced networking, etc.  N/A otherwise
   -->
   N/A
   
   ##### OS / ENVIRONMENT
   <!--
   Information about the environment if relevant, N/A otherwise
   -->
   N/A
   
   ##### SUMMARY
   <!-- Explain the problem/feature briefly -->
   When CPU overcommit factor is changed, the prometheus exporter metric "cloudstack_host_cpu_usage_mhz_total" as well as API response of listHosts (cpuused field) seems to be multiplied to the new overcommit factor.
   
   The actual "used" metrics should not be affected by overcommit factor. Overcommit factor should only virtually increase the capacity the node has, and not affecting the usage metric.
   
   
   ##### STEPS TO REPRODUCE
   <!--
   For bugs, show exactly how to reproduce the problem, using a minimal test-case. Use Screenshots if accurate.
   
   For new features, show how the feature would be used.
   -->
   
   <!-- Paste example playbooks or commands between quotes below -->
   ~~~
   1. Empty out a hypervisors from VMs, VRs, systemvms etc. So there is no virtual machines running on it.
   2. Pick a virtual machine to start on that hypervisor. Before starting, note the amount of CPU cores and CPU Mhz it has, e.g two cores 500Mhz each.
   3. After you have started the test virtual machine on the test hypervisor, check the Prometheus cloudstack_host_cpu_usage_mhz_total{hostname=<your test hypervisor metric>}. It should show the CPU Mhz used on that hypervisor: cpu_number * cpu_mhz, e.g. 1000. This is the correct value.
   4. Now change the cluster setting cpu.overprovisioning.factor to a new value, e.g. 4. 
   5. The cloudstack_host_cpu_usage_mhz_total{hostname=<your test hypervisor metric>} now shows different value, presumably calulated by the formula: cpu_number * cpu_mhz * (new_overprovisioning_factor - old_overprovisioning_factor)
   6. If you stop and start the test VM, then the cloudstack_host_cpu_usage_mhz_total goes back to normal.
   
   Same reproduce steps can be applied to the API response of listHosts call, field cpuused.
   If you start a VM, then change overprovisioning factor, the field will contain incorrect value (especially if you put ridiculously high overprovisioning factor value, such as 1000).
   ~~~
   
   <!-- You can also paste gist.github.com links for larger files -->
   
   ##### EXPECTED RESULTS
   <!-- What did you expect to happen when running the steps above? -->
   
   ~~~
   The prometheus metric cloudstack_host_cpu_usage_mhz_total and API response of listHosts call (field cpuused) should not contain overprovisioning factor in their calculation as usage reports report on real usage.
   ~~~
   
   ##### ACTUAL RESULTS
   <!-- What actually happened? -->
   
   <!-- Paste verbatim command output between quotes below -->
   ~~~
   The metric is reported without overprovisioning factor in its calculation when a VM starts, then gets distorted when you change overprovisioning factor.
   ~~~
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Changing CPU overprovisioning factor breaks prometheus and listHosts usage metrics [cloudstack]

Posted by "sureshanaparti (via GitHub)" <gi...@apache.org>.
sureshanaparti closed issue #7591: Changing CPU overprovisioning factor breaks prometheus and listHosts usage metrics 
URL: https://github.com/apache/cloudstack/issues/7591


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Changing CPU overprovisioning factor breaks prometheus and listHosts usage metrics [cloudstack]

Posted by "sureshanaparti (via GitHub)" <gi...@apache.org>.
sureshanaparti commented on issue #7591:
URL: https://github.com/apache/cloudstack/issues/7591#issuecomment-2103996266

   Re-opened, it seems #7629 fixes different issue as mentioned [here](https://github.com/apache/cloudstack/issues/7625#issuecomment-1592948525).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [cloudstack] phsm commented on issue #7591: Changing CPU overprovisioning factor breaks prometheus and listHosts usage metrics

Posted by "phsm (via GitHub)" <gi...@apache.org>.
phsm commented on issue #7591:
URL: https://github.com/apache/cloudstack/issues/7591#issuecomment-1584587876

   @NuxRo Thanks for reviewing this bug report.
   
   I think you've misunderstood what is the actual bug is here. 
   
   The bug is that the overprovisioning factor is affecting the cloudstack_host_cpu_usage_mhz_total metric when you change the overprovisioning factor while it shouldn't.
   
   1. Lets consider you have overprovisioning factor 10. 
   2. When you start a VM with 2 cores, 2000Mhz on an empty hypervisor, then the metric cloudstack_host_cpu_usage_mhz_total for that hypervisor will show you the value of 2000. This is correct behavior: the metric shows you the real usage, not multiplied by overprovisioning factor. 
   3. When you **change** the overprovisioning factor while the VM is running, the metric values becomes multiplied on what seems to be `(new_overprovision_factor - old overprovision_factor)`
   
   So this is inconsistency in the behavior: usage metric is not affected by overprovisioning factor when you start a VM. When you change the overprovisioning factor,  then the VM usage becomes affected by overprovisioning factor.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [cloudstack] NuxRo commented on issue #7591: Changing CPU overprovisioning factor breaks prometheus and listHosts usage metrics

Posted by "NuxRo (via GitHub)" <gi...@apache.org>.
NuxRo commented on issue #7591:
URL: https://github.com/apache/cloudstack/issues/7591#issuecomment-1586993746

   @phsm Yes, it is a problem. This behaviour is unfortunate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [cloudstack] NuxRo commented on issue #7591: Changing CPU overprovisioning factor breaks prometheus and listHosts usage metrics

Posted by "NuxRo (via GitHub)" <gi...@apache.org>.
NuxRo commented on issue #7591:
URL: https://github.com/apache/cloudstack/issues/7591#issuecomment-1581070059

   @phsm The docs do say that you need to stop and start the VMs:
   
   http://docs.cloudstack.apache.org/en/latest/adminguide/hosts.html?highlight=over-provisioning#setting-over-provisioning-factors
   
   ```Only VMs deployed after the change are affected by the new setting. If you want VMs deployed before the change to adopt the new over-provisioning factor, you must stop and restart the VMs. When this is done, CloudStack recalculates or scales the used and reserved capacities based on the new over-provisioning factors, to ensure that CloudStack is correctly tracking the amount of free capacity.```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Changing CPU overprovisioning factor breaks prometheus and listHosts usage metrics [cloudstack]

Posted by "sureshanaparti (via GitHub)" <gi...@apache.org>.
sureshanaparti commented on issue #7591:
URL: https://github.com/apache/cloudstack/issues/7591#issuecomment-2103979411

   Closing this, as the fix PR got merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Changing CPU overprovisioning factor breaks prometheus and listHosts usage metrics [cloudstack]

Posted by "soreana (via GitHub)" <gi...@apache.org>.
soreana commented on issue #7591:
URL: https://github.com/apache/cloudstack/issues/7591#issuecomment-2105732943

   > Re-opened, it seems #7629 fixes different issue as mentioned [here](https://github.com/apache/cloudstack/issues/7625#issuecomment-1592948525).
   
   Yes @sureshanaparti it is a different issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org