You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Walter Underwood <wu...@wunderwood.org> on 2020/12/16 18:41:17 UTC

CPU and memory circuit breaker documentation issues

In https://lucene.apache.org/solr/guide/8_7/circuit-breakers.html <https://lucene.apache.org/solr/guide/8_7/circuit-breakers.html>

URL to Wikipedia is broken, but that doesn’t matter, because that article is about a different metric. The Unix “load average” is the length of the run queue, the number of processes or threads waiting to run. That can go much, much higher than 1.0. In a high load system, I’ve seen it at 2X the number of CPUs or higher.

Remove that link, it is misleading.

The page should list the JMX metrics that are used for this. I’m guessing this uses OperatingSystemMXBean.getSystemCPULoad(). That metric goes from 0.0 to 1.0.

https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html <https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html>

I can see where the “load average” and “getSystemCPULoad” names cause confusion, but this should be correct in the documents.

Which metric is used for the memory threshold? My best guess is that the percentage is calculated from the MemoryUsage object returned by MemoryMXBean.getHeapMemoryUsage().

https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryMXBean.html <https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryMXBean.html>
https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryUsage.html <https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryUsage.html>

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


Re: CPU and memory circuit breaker documentation issues

Posted by Dmitri Maziuk <dm...@gmail.com>.
On 12/18/2020 11:26 AM, Walter Underwood wrote:
...
> I’ll file a bug and submit a patch to use for larger batch of entries with data in the same format, . How do I fix the documentation?

CPU load may be good if your process is CPU-bound. If you're stuck on 
iowait in your $data filesystem and not "actively running", it's not 
clear that OperatingSystemMXBean.getSystemCPULoad() will account for 
that. IIRC load average will, so you may in fact be better off using 
that divided by number of cores (which you can also get from the bean).

Dima

Re: CPU and memory circuit breaker documentation issues

Posted by Walter Underwood <wu...@wunderwood.org>.
Thanks. I’m already familiar with adoc. https://issues.apache.org/jira/browse/SOLR-15056 <https://issues.apache.org/jira/browse/SOLR-15056>

Now I need to brush up on How To Contribute.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 18, 2020, at 12:23 PM, Anshum Gupta <an...@anshumgupta.net> wrote:
> 
> Hi Walter,
> 
> Thanks for taking this up.
> 
> You can file a PR for the documentation change too as our docs are now a
> part of the repo. Here's where you can find the docs:
> https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide
> 
> 
> On Fri, Dec 18, 2020 at 9:26 AM Walter Underwood <wu...@wunderwood.org>
> wrote:
> 
>> Looking at the code, the CPU circuit breaker is unusable.
>> 
>> This actually does use Unix load average
>> (operatingSystemMXBean.getSystemLoadAverage()). That is a terrible idea.
>> Interpreting the load average requires knowing the number of CPUs on a
>> system. If I have 16 CPUs, I would probably set the limit at 16, with one
>> process waiting for each CPU.
>> 
>> Unfortunately, this implementation limits the thresholds to 0.5 to 0.95,
>> because the implementer thought they were getting a CPU usage value, I
>> guess. So the whole thing doesn’t work right.
>> 
>> I’ll file a bug and submit a patch to use
>> OperatingSystemMXBean.getSystemCPULoad(). How do I fix the documentation?
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Dec 16, 2020, at 10:41 AM, Walter Underwood <wu...@wunderwood.org>
>> wrote:
>>> 
>>> In https://lucene.apache.org/solr/guide/8_7/circuit-breakers.html <
>> https://lucene.apache.org/solr/guide/8_7/circuit-breakers.html>
>>> 
>>> URL to Wikipedia is broken, but that doesn’t matter, because that
>> article is about a different metric. The Unix “load average” is the length
>> of the run queue, the number of processes or threads waiting to run. That
>> can go much, much higher than 1.0. In a high load system, I’ve seen it at
>> 2X the number of CPUs or higher.
>>> 
>>> Remove that link, it is misleading.
>>> 
>>> The page should list the JMX metrics that are used for this. I’m
>> guessing this uses OperatingSystemMXBean.getSystemCPULoad(). That metric
>> goes from 0.0 to 1.0.
>>> 
>>> 
>> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html
>> <
>> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html
>>> 
>>> 
>>> I can see where the “load average” and “getSystemCPULoad” names cause
>> confusion, but this should be correct in the documents.
>>> 
>>> Which metric is used for the memory threshold? My best guess is that the
>> percentage is calculated from the MemoryUsage object returned by
>> MemoryMXBean.getHeapMemoryUsage().
>>> 
>>> 
>> https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryMXBean.html
>> <
>> https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryMXBean.html
>>> 
>>> 
>> https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryUsage.html
>> <
>> https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryUsage.html
>>> 
>>> 
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org <ma...@wunderwood.org>
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>> 
>> 
> 
> -- 
> Anshum Gupta


Re: CPU and memory circuit breaker documentation issues

Posted by Anshum Gupta <an...@anshumgupta.net>.
Hi Walter,

Thanks for taking this up.

You can file a PR for the documentation change too as our docs are now a
part of the repo. Here's where you can find the docs:
https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide


On Fri, Dec 18, 2020 at 9:26 AM Walter Underwood <wu...@wunderwood.org>
wrote:

> Looking at the code, the CPU circuit breaker is unusable.
>
> This actually does use Unix load average
> (operatingSystemMXBean.getSystemLoadAverage()). That is a terrible idea.
> Interpreting the load average requires knowing the number of CPUs on a
> system. If I have 16 CPUs, I would probably set the limit at 16, with one
> process waiting for each CPU.
>
> Unfortunately, this implementation limits the thresholds to 0.5 to 0.95,
> because the implementer thought they were getting a CPU usage value, I
> guess. So the whole thing doesn’t work right.
>
> I’ll file a bug and submit a patch to use
> OperatingSystemMXBean.getSystemCPULoad(). How do I fix the documentation?
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Dec 16, 2020, at 10:41 AM, Walter Underwood <wu...@wunderwood.org>
> wrote:
> >
> > In https://lucene.apache.org/solr/guide/8_7/circuit-breakers.html <
> https://lucene.apache.org/solr/guide/8_7/circuit-breakers.html>
> >
> > URL to Wikipedia is broken, but that doesn’t matter, because that
> article is about a different metric. The Unix “load average” is the length
> of the run queue, the number of processes or threads waiting to run. That
> can go much, much higher than 1.0. In a high load system, I’ve seen it at
> 2X the number of CPUs or higher.
> >
> > Remove that link, it is misleading.
> >
> > The page should list the JMX metrics that are used for this. I’m
> guessing this uses OperatingSystemMXBean.getSystemCPULoad(). That metric
> goes from 0.0 to 1.0.
> >
> >
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html
> <
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html
> >
> >
> > I can see where the “load average” and “getSystemCPULoad” names cause
> confusion, but this should be correct in the documents.
> >
> > Which metric is used for the memory threshold? My best guess is that the
> percentage is calculated from the MemoryUsage object returned by
> MemoryMXBean.getHeapMemoryUsage().
> >
> >
> https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryMXBean.html
> <
> https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryMXBean.html
> >
> >
> https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryUsage.html
> <
> https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryUsage.html
> >
> >
> > wunder
> > Walter Underwood
> > wunder@wunderwood.org <ma...@wunderwood.org>
> > http://observer.wunderwood.org/  (my blog)
> >
>
>

-- 
Anshum Gupta

Re: CPU and memory circuit breaker documentation issues

Posted by Walter Underwood <wu...@wunderwood.org>.
Looking at the code, the CPU circuit breaker is unusable.

This actually does use Unix load average (operatingSystemMXBean.getSystemLoadAverage()). That is a terrible idea. Interpreting the load average requires knowing the number of CPUs on a system. If I have 16 CPUs, I would probably set the limit at 16, with one process waiting for each CPU.

Unfortunately, this implementation limits the thresholds to 0.5 to 0.95, because the implementer thought they were getting a CPU usage value, I guess. So the whole thing doesn’t work right.

I’ll file a bug and submit a patch to use OperatingSystemMXBean.getSystemCPULoad(). How do I fix the documentation?

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 16, 2020, at 10:41 AM, Walter Underwood <wu...@wunderwood.org> wrote:
> 
> In https://lucene.apache.org/solr/guide/8_7/circuit-breakers.html <https://lucene.apache.org/solr/guide/8_7/circuit-breakers.html>
> 
> URL to Wikipedia is broken, but that doesn’t matter, because that article is about a different metric. The Unix “load average” is the length of the run queue, the number of processes or threads waiting to run. That can go much, much higher than 1.0. In a high load system, I’ve seen it at 2X the number of CPUs or higher.
> 
> Remove that link, it is misleading.
> 
> The page should list the JMX metrics that are used for this. I’m guessing this uses OperatingSystemMXBean.getSystemCPULoad(). That metric goes from 0.0 to 1.0.
> 
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html <https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html>
> 
> I can see where the “load average” and “getSystemCPULoad” names cause confusion, but this should be correct in the documents.
> 
> Which metric is used for the memory threshold? My best guess is that the percentage is calculated from the MemoryUsage object returned by MemoryMXBean.getHeapMemoryUsage().
> 
> https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryMXBean.html <https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryMXBean.html>
> https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryUsage.html <https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryUsage.html>
> 
> wunder
> Walter Underwood
> wunder@wunderwood.org <ma...@wunderwood.org>
> http://observer.wunderwood.org/  (my blog)
>