You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Andrzej Bialecki (Jira)" <ji...@apache.org> on 2021/01/11 08:13:00 UTC

[jira] [Comment Edited] (SOLR-15056) CPU circuit breaker needs to use CPU utilization, not Unix load average

    [ https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262468#comment-17262468 ] 

Andrzej Bialecki edited comment on SOLR-15056 at 1/11/21, 8:12 AM:
-------------------------------------------------------------------

{{systemCpuLoad}} is already supported and returned as one of the metrics. This comes from the (somewhat convoluted) code in {{MetricUtils.addMxBeanMetrics}} where it tries to use all known implementations and accumulates any unique bean properties that they expose.

For example:
{code}
http://localhost:8983/solr/admin/metrics?group=jvm&prefix=os

{
    "responseHeader": {
        "status": 0,
        "QTime": 1
    },
    "metrics": {
        "solr.jvm": {
            "os.arch": "x86_64",
            "os.availableProcessors": 12,
            "os.committedVirtualMemorySize": 8402419712,
            "os.freePhysicalMemorySize": 41504768,
            "os.freeSwapSpaceSize": 804519936,
            "os.maxFileDescriptorCount": 8192,
            "os.name": "Mac OS X",
            "os.openFileDescriptorCount": 195,
            "os.processCpuLoad": 0.0017402379609634876,
            "os.processCpuTime": 10492010000,
            "os.systemCpuLoad": 0.1268950796343933,
            "os.systemLoadAverage": 4.00439453125,
            "os.totalPhysicalMemorySize": 34359738368,
            "os.totalSwapSpaceSize": 7516192768,
            "os.version": "10.16"
        }
    }
}
{code}


was (Author: ab):
{{systtemCpuLoad}} is already supported and returned as one of the metrics. This comes from the (somewhat convoluted) code in {{MetricUtils.addMxBeanMetrics}} where it tries to use all known implementations and accumulates any unique bean properties that they expose.

For example:
{code}
http://localhost:8983/solr/admin/metrics?group=jvm&prefix=os

{
    "responseHeader": {
        "status": 0,
        "QTime": 1
    },
    "metrics": {
        "solr.jvm": {
            "os.arch": "x86_64",
            "os.availableProcessors": 12,
            "os.committedVirtualMemorySize": 8402419712,
            "os.freePhysicalMemorySize": 41504768,
            "os.freeSwapSpaceSize": 804519936,
            "os.maxFileDescriptorCount": 8192,
            "os.name": "Mac OS X",
            "os.openFileDescriptorCount": 195,
            "os.processCpuLoad": 0.0017402379609634876,
            "os.processCpuTime": 10492010000,
            "os.systemCpuLoad": 0.1268950796343933,
            "os.systemLoadAverage": 4.00439453125,
            "os.totalPhysicalMemorySize": 34359738368,
            "os.totalSwapSpaceSize": 7516192768,
            "os.version": "10.16"
        }
    }
}
{code}

> CPU circuit breaker needs to use CPU utilization, not Unix load average
> -----------------------------------------------------------------------
>
>                 Key: SOLR-15056
>                 URL: https://issues.apache.org/jira/browse/SOLR-15056
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: metrics
>    Affects Versions: 8.7
>            Reporter: Walter Underwood
>            Priority: Major
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered by a CPU utilization metric that goes from 0% to 100%. But the code uses the metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of the count of processes waiting to run. It is effectively unbounded. I've seen it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs available to the JVM. A load average of 8 is no problem for a 32 CPU host. It is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle during the recent period of time observed, while a value of 1.0 means that all CPUs were actively running 100% of the time during the recent period being observed. All values betweens 0.0 and 1.0 are possible depending of the activities going on in the system. If the system recent cpu usage is not available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the memory and CPU circuit breakers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org