You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2021/04/27 00:07:44 UTC

[GitHub] [solr] wrunderwood opened a new pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

wrunderwood opened a new pull request #96:
URL: https://github.com/apache/solr/pull/96


   SOLR-15056  add circuit breaker for CPU utilization, use accurate name for load average circuit breaker, update docs and tests
   
   <!--
   _(If you are a project committer then you may remove some/all of the following template.)_
   
   Before creating a pull request, please file an issue in the ASF Jira system for Solr:
   
   * https://issues.apache.org/jira/projects/SOLR
   
   You will need to create an account in Jira in order to create an issue.
   
   The title of the PR should reference the Jira issue number in the form:
   
   * SOLR-####: <short description of problem or changes>
   
   SOLR must be fully capitalized. A short description helps people scanning pull requests for items they can work on.
   
   Properly referencing the issue in the title ensures that Jira is correctly updated with code review comments and commits. -->
   
   
   # Description
   
   Please provide a short description of the changes you're making with this pull request.
   
   # Solution
   
   Please provide a short description of the approach taken to implement your solution.
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request title.
   - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `main` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Reference Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] cpoerschke commented on pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
cpoerschke commented on pull request #96:
URL: https://github.com/apache/solr/pull/96#issuecomment-853328967


   So one possibility is to change the implementation of the existing `CPUCircuitBreaker` class and to add a new `LoadAverageCircuitBreaker` class which has the same behaviour as the existing `CPUCircuitBreaker` class. Changing an existing class's behaviour may bring backwards compatibility concerns and there being two related circuit breaker classes may raise questions of code duplication and whether or not to factor out a common base class.
   
   I wonder if another possibility could be to have a single `CPUCircuitBreaker` class and for a configurable attribute to determine the implementation's behaviour:
   
   Existing users with a configuration like this
   
   ```
   <circuitBreaker class="solr.CircuitBreakerManager" enabled="true">
     <str name="cpuEnabled">true</str>
     <str name="cpuThreshold">42</str>
   </circuitBreaker>
   ```
   
   would continue to get their existing behaviour, equivalent to
   
   ```
   <circuitBreaker class="solr.CircuitBreakerManager" enabled="true">
     <str name="cpuEnabled">true</str>
     <str name="cpuThreshold">42</str>
     <str name="cpuMethod">OperatingSystemMXBean</str>
   </circuitBreaker>
   ```
   
   configuration and users wishing to use the new alternative implementation can do so via
   
   ```
   <circuitBreaker class="solr.CircuitBreakerManager" enabled="true">
     <str name="cpuEnabled">true</str>
     <str name="cpuThreshold">7</str>
     <str name="cpuMethod">SolrMetricManager</str>
   </circuitBreaker>
   ```
   
   configuration.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] atris commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r644666698



##########
File path: solr/core/src/test/org/apache/solr/util/TestCircuitBreaker.java
##########
@@ -199,7 +201,63 @@ public void testFakeCPUCircuitBreaker() {
       PluginInfo pluginInfo = h.getCore().getSolrConfig().getPluginInfo(CircuitBreakerManager.class.getName());
 
       CircuitBreaker.CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerManager.buildCBConfig(pluginInfo);
-      CircuitBreaker circuitBreaker = new FakeCPUCircuitBreaker(circuitBreakerConfig);
+      CircuitBreaker circuitBreaker = new FakeCPUCircuitBreaker(circuitBreakerConfig, null);
+
+      h.getCore().getCircuitBreakerManager().register(circuitBreaker);
+
+      List<Future<?>> futures = new ArrayList<>();
+
+      for (int i = 0; i < 5; i++) {
+        Future<?> future = executor.submit(() -> {
+          try {
+            h.query(req("name:\"john smith\""));
+          } catch (SolrException e) {
+            assertThat(e.getMessage(), containsString("Circuit Breakers tripped"));
+            failureCount.incrementAndGet();
+          } catch (Exception e) {
+            throw new RuntimeException(e.getMessage());
+          }
+        });
+
+        futures.add(future);
+      }
+
+      for  (Future<?> future : futures) {
+        try {
+          future.get();
+        } catch (Exception e) {
+          throw new RuntimeException(e.getMessage());
+        }
+      }
+
+      executor.shutdown();
+      try {
+        executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
+      } catch (InterruptedException e) {
+        Thread.currentThread().interrupt();
+        throw new RuntimeException(e.getMessage());
+      }
+
+      assertEquals("Number of failed queries is not correct",5, failureCount.get());
+    } finally {
+      if (!executor.isShutdown()) {
+        executor.shutdown();
+      }
+    }
+  }
+
+  public void testFakeLoadAverageCircuitBreaker() {
+    AtomicInteger failureCount = new AtomicInteger();
+
+    ExecutorService executor = ExecutorUtil.newMDCAwareCachedThreadPool(
+        new SolrNamedThreadFactory("TestCircuitBreaker"));
+    try {
+      removeAllExistingCircuitBreakers();
+
+      PluginInfo pluginInfo = h.getCore().getSolrConfig().getPluginInfo(CircuitBreakerManager.class.getName());

Review comment:
       That does not answer my question -- they are still exact copies of each other. Lets abstract to a parent class?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] wrunderwood commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
wrunderwood commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r643423811



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CPUCircuitBreaker.java
##########
@@ -110,6 +113,27 @@ public double getCpuUsageThreshold() {
   }
 
   protected double calculateLiveCPUUsage() {
-    return operatingSystemMXBean.getSystemLoadAverage();
+    Metric metric = this.core
+        .getCoreContainer()
+        .getMetricManager()
+        .registry("solr.jvm")
+        .getMetrics()
+        .get("os.systemCpuLoad");
+
+    if (metric == null) {
+        return -1.0;
+    }
+
+    if (metric instanceof Gauge) {
+      @SuppressWarnings({"rawtypes"})
+          Gauge gauge = (Gauge) metric;
+      // unwrap if needed
+      if (gauge instanceof SolrMetricManager.GaugeWrapper) {
+        gauge = ((SolrMetricManager.GaugeWrapper) gauge).getGauge();
+      }
+      return ((Double) gauge.getValue()).doubleValue();
+    }
+
+    return -1.0;                // Unable to unpack metric

Review comment:
       There is a warning, see the documentation for details.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] wrunderwood commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
wrunderwood commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r644391283



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       I was unaware of SolrCoreAware, I'll go check it out. Thanks.
   
   The other larger change is to have circuit breakers only interrupt external requests and to not interrupt internal requests to shards. The current approach multiples failures. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] janhoy commented on pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
janhoy commented on pull request #96:
URL: https://github.com/apache/solr/pull/96#issuecomment-853442218


   > Whoever implemented this did not understand how the load average metric works.
   
   Please keep the discussion factual and about the code, not the people.
   
   I think the chosen path of having separate LoadAvg and CpuPct implementations is better than some extra flag to choose. This keeps the configuration more direct and clear, and the documentation of `cpuThreshold` will not be conditional on another parameter - the two impls can be documented separate from each other.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] wrunderwood commented on pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
wrunderwood commented on pull request #96:
URL: https://github.com/apache/solr/pull/96#issuecomment-853347376


   That is what this patch does, move the existing behavior to a load average circuit breaker and create a new CPU usage circuit breaker.
   
   Combining the two has a problem. The range of values for the two metrics are quite different. CPU usage is limited to 0 to 1.0. Load average is unbounded, and useful limits are much greater than 1.0, typically set at the number of CPUs in the system. On a 32-way system, a good limit would be around 32.
   
   Also, load average is not a CPU metric. It is a “system load” metric. On Linux, it includes processes waiting on IO.
   
   I don’t think we need to worry about existing users. With the load average (“CPU”) circuit breaker limited to 1.0, it is not usable.
   
   wunder
   Walter Underwood
   ***@***.***
   http://observer.wunderwood.org/  (my blog)
   
   > On Jun 2, 2021, at 12:33 PM, Christine Poerschke ***@***.***> wrote:
   > 
   > 
   > So one possibility is to change the implementation of the existing CPUCircuitBreaker class and to add a new LoadAverageCircuitBreaker class which has the same behaviour as the existing CPUCircuitBreaker class. Changing an existing class's behaviour may bring backwards compatibility concerns and there being two related circuit breaker classes may raise questions of code duplication and whether or not to factor out a common base class.
   > 
   > I wonder if another possibility could be to have a single CPUCircuitBreaker class and for a configurable attribute to determine the implementation's behaviour:
   > 
   > Existing users with a configuration like this
   > 
   > <circuitBreaker class="solr.CircuitBreakerManager" enabled="true">
   >   <str name="cpuEnabled">true</str>
   >   <str name="cpuThreshold">42</str>
   > </circuitBreaker>
   > would continue to get their existing behaviour, equivalent to
   > 
   > <circuitBreaker class="solr.CircuitBreakerManager" enabled="true">
   >   <str name="cpuEnabled">true</str>
   >   <str name="cpuThreshold">42</str>
   >   <str name="cpuMethod">OperatingSystemMXBean</str>
   > </circuitBreaker>
   > configuration and users wishing to use the new alternative implementation can do so via
   > 
   > <circuitBreaker class="solr.CircuitBreakerManager" enabled="true">
   >   <str name="cpuEnabled">true</str>
   >   <str name="cpuThreshold">7</str>
   >   <str name="cpuMethod">SolrMetricManager</str>
   > </circuitBreaker>
   > configuration.
   > 
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub <https://github.com/apache/solr/pull/96#issuecomment-853328967>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACR4IINBRUWYVWNQNUBJ6NTTQ2BSPANCNFSM43T5FIZA>.
   > 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] atris commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r644474656



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       Thanks for the input, @janhoy . This is elegant and a clean solution.
   
   RE: Not having a common manager, a common manager is needed so that SolrCore has to deal. with only one access point for all CBs. However, I agree that CircuitBreakerManager should not need to be explicitly aware of each CB implementation, will fix that for 9.0
   
   Also, I am confused about your comment of the common config class -- there is a common config class already for all circuit breakers, and the configuration is completely pluggable since it implements the PluginInfo pattern?
   
   
   
   

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/LoadAverageCircuitBreaker.java
##########
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.util.circuitbreaker;
+
+import java.lang.invoke.MethodHandles;
+import java.lang.management.ManagementFactory;
+import java.lang.management.OperatingSystemMXBean;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * <p>
+ * Tracks current system load average and triggers if the specified threshold is breached.
+ *
+ * This circuit breaker gets the load average (length of the run queue) over the last
+ * minute and uses that data to take a decision. We depend on OperatingSystemMXBean which does
+ * not allow a configurable interval of collection of data.
+ * //TODO: Use Codahale Meter to calculate the value locally.
+ * </p>
+ *
+ * <p>
+ * The configuration to define which mode to use and the trigger threshold are defined in
+ * solrconfig.xml
+ * </p>
+ */
+public class LoadAverageCircuitBreaker extends CircuitBreaker {
+  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+  private static final OperatingSystemMXBean operatingSystemMXBean = ManagementFactory.getOperatingSystemMXBean();
+
+  private final boolean enabled;
+  private final double loadAverageThreshold;
+
+  // Assumption -- the value of these parameters will be set correctly before invoking getDebugInfo()
+  private static final ThreadLocal<Double> seenLoadAverage = ThreadLocal.withInitial(() -> 0.0);
+
+  private static final ThreadLocal<Double> allowedLoadAverage = ThreadLocal.withInitial(() -> 0.0);
+
+  public LoadAverageCircuitBreaker(CircuitBreakerConfig config) {
+    super(config);
+
+    this.enabled = config.getLoadAverageCBEnabled();
+    this.loadAverageThreshold = config.getLoadAverageCBThreshold();
+  }
+
+  @Override
+  public boolean isTripped() {
+    if (!isEnabled()) {
+      return false;
+    }
+
+    if (!enabled) {
+      return false;
+    }
+
+    double localAllowedLoadAverage = getLoadAverageThreshold();
+    double localSeenLoadAverage = calculateLiveLoadAverage();
+
+    if (localSeenLoadAverage < 0) {
+      if (log.isWarnEnabled()) {

Review comment:
       @janhoy I am sorry, but the refactorings proposed do not really have anything to do with the actual circuit breakers themselves.
   
   The two circuit breakers (load average and proposed CPU CB) share enough method signatures to warrant common interfaces, and their tests are alike apart from class names. This is a bad code smell enough to be fixed, and I do not see why it cannot be done in this PR itself.

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CPUCircuitBreaker.java
##########
@@ -110,6 +113,27 @@ public double getCpuUsageThreshold() {
   }
 
   protected double calculateLiveCPUUsage() {
-    return operatingSystemMXBean.getSystemLoadAverage();
+    Metric metric = this.core
+        .getCoreContainer()
+        .getMetricManager()
+        .registry("solr.jvm")
+        .getMetrics()
+        .get("os.systemCpuLoad");

Review comment:
       Yes, so here is the idea:
   
   Typical Solr clusters are dedicated node models (Solr is not expected to share resources with any other process). To my knowledge, most IO bound tasks in Solr also have a CPU impact so the possibility of a Solr operation having such a massive IO wait that it triggers the CB *without* having any significant impact on CPU is low (unless you are running Solr on USB storage drives).
   
   Please remember that the circuit breaker is designed to *only* deal with typical Solr load patterns, hence is not targeted for generic situations. Unless misconfigured, Solr will use multiple CPUs if they are available, hence I do not understand your point.
   
   Even in the scenario you described, you can simply set the trigger threshold of the CB higher so that it will trigger only when CPU usage is also high. 
   
   Let's say you set the threshold to 200. You will need more than 150 IO waiting processes *and* moderate CPU usage to trigger the threshold.
   
   Also, load average is computed by contributions of three factors -- CPU usage, tasks waiting for CPU and (in some implementations) IO blocked tasks. If you set the triggering threshold high enough, there is a very slim possibility that the IO blocked processes metric can overwhelm the others and cause the CB to trigger.
   
   But we digress. Solr is an open source project. Circuit breakers are designed to be extendable. If you need a circuit breaker with a different metric, by all means, please implement the same.
   
   Let us now focus on your complaints against the existing circuit breaker:
   
   1. Documentation not reflecting that in the platforms that choose to include IO waiting tasks as a factor in load average, CPU CB will implicitly include the same in its triggering threshold. I agree this should be included and I thank you for fixing this.
   
   2. Existing CB does not work because it limits the threshold that can be set between 0 and 1 -- Where did that come from?
   
   Solr's circuit breaker documentation (https://solr.apache.org/guide/8_7/circuit-breakers.html) does not state anything of this sort for the CPU circuit breaker. Nor does the code implement anything of this sort. The triggering threshold is unbounded, because the load average metric is unbounded. At work, we use a threshold in multiples of 100 and it works perfectly well for us.
   
   So, where is this claim coming from? Where exactly have you seen the limits being enforced?
   
   Your entire premise that the existing code does not work is based on this fulcrum of the limits, which do not exist in the code or documentation. You have been perpetually defaming the feature and have even gone to the extent of "warning" our users to immediately stop using it. My question is -- what is the basis of the claim?

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       @wrunderwood I have highlighted this to yourself before months back when you asked the same question, and I will repeat it again -- Solr 8x has no way of distinguishing between internal and external requests hence there is no way for circuit breakers to know what kind of requests they are seeing. I implemented this functionality in SOLR-13528 and will be doing the request level disambiguation in 9.x but there is still no way to identify the same in 8.x

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreaker.java
##########
@@ -66,14 +66,19 @@ protected boolean isEnabled() {
     private final int memCBThreshold;
     private final boolean cpuCBEnabled;
     private final int cpuCBThreshold;
+    private final boolean loadAverageCBEnabled;
+    private final double loadAverageCBThreshold;
 
     public CircuitBreakerConfig(final boolean enabled, final boolean memCBEnabled, final int memCBThreshold,
-                                  final boolean cpuCBEnabled, final int cpuCBThreshold) {
+                                final boolean cpuCBEnabled, final int cpuCBThreshold,

Review comment:
       I was advocating including the name of the metric being used for the CPU enabled CB (and not call it cpuCBEnabled)

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       There is now, post SOLR-13528, in 9x. There was none in 8x, hence I had to add that mechanism.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] atris commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r639601638



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreaker.java
##########
@@ -66,14 +66,19 @@ protected boolean isEnabled() {
     private final int memCBThreshold;
     private final boolean cpuCBEnabled;
     private final int cpuCBThreshold;
+    private final boolean loadAverageCBEnabled;
+    private final double loadAverageCBThreshold;
 
     public CircuitBreakerConfig(final boolean enabled, final boolean memCBEnabled, final int memCBThreshold,
-                                  final boolean cpuCBEnabled, final int cpuCBThreshold) {
+                                final boolean cpuCBEnabled, final int cpuCBThreshold,

Review comment:
       Rename to highlight the metric (like you did for loadAverageCB renaming)

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CPUCircuitBreaker.java
##########
@@ -110,6 +113,27 @@ public double getCpuUsageThreshold() {
   }
 
   protected double calculateLiveCPUUsage() {
-    return operatingSystemMXBean.getSystemLoadAverage();
+    Metric metric = this.core
+        .getCoreContainer()
+        .getMetricManager()
+        .registry("solr.jvm")
+        .getMetrics()
+        .get("os.systemCpuLoad");
+
+    if (metric == null) {
+        return -1.0;
+    }
+
+    if (metric instanceof Gauge) {
+      @SuppressWarnings({"rawtypes"})
+          Gauge gauge = (Gauge) metric;
+      // unwrap if needed
+      if (gauge instanceof SolrMetricManager.GaugeWrapper) {
+        gauge = ((SolrMetricManager.GaugeWrapper) gauge).getGauge();
+      }
+      return ((Double) gauge.getValue()).doubleValue();
+    }
+
+    return -1.0;                // Unable to unpack metric

Review comment:
       Please log a warning here -- we should not silently project that the system is not under load

##########
File path: solr/core/src/test/org/apache/solr/util/TestCircuitBreaker.java
##########
@@ -335,13 +393,24 @@ protected long calculateLiveMemoryUsage() {
   }
 
   private static class FakeCPUCircuitBreaker extends CPUCircuitBreaker {
-    public FakeCPUCircuitBreaker(CircuitBreakerConfig config) {
-      super(config);
+    public FakeCPUCircuitBreaker(CircuitBreakerConfig config, SolrCore core) {
+      super(config, core);
     }
 
     @Override
     protected double calculateLiveCPUUsage() {
       return 92; // Return a value large enough to trigger the circuit breaker
     }
   }
+
+  private static class FakeLoadAverageCircuitBreaker extends LoadAverageCircuitBreaker {
+    public FakeLoadAverageCircuitBreaker(CircuitBreakerConfig config) {

Review comment:
       Again, in this class not an exact copy of FakeCPUCircuitBreaker?

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CPUCircuitBreaker.java
##########
@@ -110,6 +113,27 @@ public double getCpuUsageThreshold() {
   }
 
   protected double calculateLiveCPUUsage() {
-    return operatingSystemMXBean.getSystemLoadAverage();
+    Metric metric = this.core
+        .getCoreContainer()
+        .getMetricManager()
+        .registry("solr.jvm")
+        .getMetrics()
+        .get("os.systemCpuLoad");

Review comment:
       I dont quite understand this change -- so you are replacing the default CPU circuit breaker with a new implementation and moving the earlier implementation to a new class?
   
   IMO, both the circuit breakers operate on CPU, using different metrics. Both circuit breakers should be named in a way which represents the fact that they operate on CPU and highlight which metric they depend on.

##########
File path: solr/core/src/test/org/apache/solr/util/TestCircuitBreaker.java
##########
@@ -199,7 +201,63 @@ public void testFakeCPUCircuitBreaker() {
       PluginInfo pluginInfo = h.getCore().getSolrConfig().getPluginInfo(CircuitBreakerManager.class.getName());
 
       CircuitBreaker.CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerManager.buildCBConfig(pluginInfo);
-      CircuitBreaker circuitBreaker = new FakeCPUCircuitBreaker(circuitBreakerConfig);
+      CircuitBreaker circuitBreaker = new FakeCPUCircuitBreaker(circuitBreakerConfig, null);

Review comment:
       Please add a comment representing what the null stands for

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/LoadAverageCircuitBreaker.java
##########
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.util.circuitbreaker;
+
+import java.lang.invoke.MethodHandles;
+import java.lang.management.ManagementFactory;
+import java.lang.management.OperatingSystemMXBean;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * <p>
+ * Tracks current system load average and triggers if the specified threshold is breached.
+ *
+ * This circuit breaker gets the load average (length of the run queue) over the last
+ * minute and uses that data to take a decision. We depend on OperatingSystemMXBean which does
+ * not allow a configurable interval of collection of data.
+ * //TODO: Use Codahale Meter to calculate the value locally.
+ * </p>
+ *
+ * <p>
+ * The configuration to define which mode to use and the trigger threshold are defined in
+ * solrconfig.xml
+ * </p>
+ */
+public class LoadAverageCircuitBreaker extends CircuitBreaker {
+  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+  private static final OperatingSystemMXBean operatingSystemMXBean = ManagementFactory.getOperatingSystemMXBean();
+
+  private final boolean enabled;
+  private final double loadAverageThreshold;
+
+  // Assumption -- the value of these parameters will be set correctly before invoking getDebugInfo()
+  private static final ThreadLocal<Double> seenLoadAverage = ThreadLocal.withInitial(() -> 0.0);
+
+  private static final ThreadLocal<Double> allowedLoadAverage = ThreadLocal.withInitial(() -> 0.0);
+
+  public LoadAverageCircuitBreaker(CircuitBreakerConfig config) {
+    super(config);
+
+    this.enabled = config.getLoadAverageCBEnabled();
+    this.loadAverageThreshold = config.getLoadAverageCBThreshold();
+  }
+
+  @Override
+  public boolean isTripped() {
+    if (!isEnabled()) {
+      return false;
+    }
+
+    if (!enabled) {
+      return false;
+    }
+
+    double localAllowedLoadAverage = getLoadAverageThreshold();
+    double localSeenLoadAverage = calculateLiveLoadAverage();
+
+    if (localSeenLoadAverage < 0) {
+      if (log.isWarnEnabled()) {

Review comment:
       There is a fair bit of overlap between the two circuit breakers -- can we move the common code and methods to a super class, and define the contract which both (and future) CPU circuit breakers need to adhere to?

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       I am wary of adding a SolrCore dependency here since only one circuit breaker really uses it and this pollutes the API a fair bit.

##########
File path: solr/solr-ref-guide/src/circuit-breakers.adoc
##########
@@ -94,16 +114,45 @@ Configuration for CPU utilization based circuit breaker:
 Note that this configuration will be overridden by the global circuit breaker flag -- if circuit breakers are disabled, this flag
 will not help you.
 
-The triggering threshold is defined in units of CPU utilization. The configuration to control this is as below:
+The triggering threshold is defined in percent CPU usage. A value of "0" maps to 0% usage 
+and a value of "100" maps to 100% usage.  This circuit breaker will trip when the CPU usage is
+equal to or greater than 75%:
 
 [source,xml]
 ----
 <str name="cpuThreshold">75</str>
 ----
 
+=== System Load Average Circuit Breaker
+This circuit breaker tracks system load average and triggers if the recent load average exceeds a configurable threshold.
+
+This is tracked with the JMX metric `OperatingSystemMXBean.getSystemLoadAverage()`. That measures the
+recent load average for the whole system. A "load average" is the number of processes using or waiting for a CPU,
+usually averaged over one minute. Some systems include processes waiting on IO in the load average. Check the
+documentation for your system and JVM to understand this metric. For more information, see the
+https://en.wikipedia.org/wiki/Load_(computing)[Wikipedia page for Load],
+
+Configuration for load average circuit breaker:
+
+[source,xml]
+----
+<str name="loadAverageEnabled">true</str>
+----
+
+Note that this configuration will be overridden by the global circuit breaker flag -- if circuit breakers are disabled, this flag
+will not help you.
+
+The triggering threshold is a floating point number matching load average. This circuit breaker will trip when the

Review comment:
       Not reviewing these changes -- will depend on Cassandra's review at some point

##########
File path: solr/core/src/test/org/apache/solr/util/TestCircuitBreaker.java
##########
@@ -199,7 +201,63 @@ public void testFakeCPUCircuitBreaker() {
       PluginInfo pluginInfo = h.getCore().getSolrConfig().getPluginInfo(CircuitBreakerManager.class.getName());
 
       CircuitBreaker.CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerManager.buildCBConfig(pluginInfo);
-      CircuitBreaker circuitBreaker = new FakeCPUCircuitBreaker(circuitBreakerConfig);
+      CircuitBreaker circuitBreaker = new FakeCPUCircuitBreaker(circuitBreakerConfig, null);
+
+      h.getCore().getCircuitBreakerManager().register(circuitBreaker);
+
+      List<Future<?>> futures = new ArrayList<>();
+
+      for (int i = 0; i < 5; i++) {
+        Future<?> future = executor.submit(() -> {
+          try {
+            h.query(req("name:\"john smith\""));
+          } catch (SolrException e) {
+            assertThat(e.getMessage(), containsString("Circuit Breakers tripped"));
+            failureCount.incrementAndGet();
+          } catch (Exception e) {
+            throw new RuntimeException(e.getMessage());
+          }
+        });
+
+        futures.add(future);
+      }
+
+      for  (Future<?> future : futures) {
+        try {
+          future.get();
+        } catch (Exception e) {
+          throw new RuntimeException(e.getMessage());
+        }
+      }
+
+      executor.shutdown();
+      try {
+        executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
+      } catch (InterruptedException e) {
+        Thread.currentThread().interrupt();
+        throw new RuntimeException(e.getMessage());
+      }
+
+      assertEquals("Number of failed queries is not correct",5, failureCount.get());
+    } finally {
+      if (!executor.isShutdown()) {
+        executor.shutdown();
+      }
+    }
+  }
+
+  public void testFakeLoadAverageCircuitBreaker() {
+    AtomicInteger failureCount = new AtomicInteger();
+
+    ExecutorService executor = ExecutorUtil.newMDCAwareCachedThreadPool(
+        new SolrNamedThreadFactory("TestCircuitBreaker"));
+    try {
+      removeAllExistingCircuitBreakers();
+
+      PluginInfo pluginInfo = h.getCore().getSolrConfig().getPluginInfo(CircuitBreakerManager.class.getName());

Review comment:
       Isnt this test doing the exact same thing as the original CPU circuit breaker test, only using a different class?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] janhoy commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
janhoy commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r644375943



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreaker.java
##########
@@ -66,14 +66,19 @@ protected boolean isEnabled() {
     private final int memCBThreshold;
     private final boolean cpuCBEnabled;
     private final int cpuCBThreshold;
+    private final boolean loadAverageCBEnabled;
+    private final double loadAverageCBThreshold;
 
     public CircuitBreakerConfig(final boolean enabled, final boolean memCBEnabled, final int memCBThreshold,
-                                  final boolean cpuCBEnabled, final int cpuCBThreshold) {
+                                final boolean cpuCBEnabled, final int cpuCBThreshold,

Review comment:
       After this change there will be tree CBs available: `memCBEnabled`, `loadAverageCBEnabled` (previously cpuCBEnabled), and `cpuCBEnabled` (the new one). So this should be correct, not?

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CPUCircuitBreaker.java
##########
@@ -110,6 +113,27 @@ public double getCpuUsageThreshold() {
   }
 
   protected double calculateLiveCPUUsage() {
-    return operatingSystemMXBean.getSystemLoadAverage();
+    Metric metric = this.core
+        .getCoreContainer()
+        .getMetricManager()
+        .registry("solr.jvm")
+        .getMetrics()
+        .get("os.systemCpuLoad");
+
+    if (metric == null) {
+        return -1.0;
+    }
+
+    if (metric instanceof Gauge) {
+      @SuppressWarnings({"rawtypes"})
+          Gauge gauge = (Gauge) metric;
+      // unwrap if needed
+      if (gauge instanceof SolrMetricManager.GaugeWrapper) {
+        gauge = ((SolrMetricManager.GaugeWrapper) gauge).getGauge();
+      }
+      return ((Double) gauge.getValue()).doubleValue();
+    }
+
+    return -1.0;                // Unable to unpack metric

Review comment:
       I checked. The caller will look for <0 values and warn-log "Unable to get CPU usage". I don't fancy special return values to signal errors when there could be a dedicated Exception and catch logic though.

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/LoadAverageCircuitBreaker.java
##########
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.util.circuitbreaker;
+
+import java.lang.invoke.MethodHandles;
+import java.lang.management.ManagementFactory;
+import java.lang.management.OperatingSystemMXBean;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * <p>
+ * Tracks current system load average and triggers if the specified threshold is breached.
+ *
+ * This circuit breaker gets the load average (length of the run queue) over the last
+ * minute and uses that data to take a decision. We depend on OperatingSystemMXBean which does
+ * not allow a configurable interval of collection of data.
+ * //TODO: Use Codahale Meter to calculate the value locally.
+ * </p>
+ *
+ * <p>
+ * The configuration to define which mode to use and the trigger threshold are defined in
+ * solrconfig.xml
+ * </p>
+ */
+public class LoadAverageCircuitBreaker extends CircuitBreaker {
+  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+  private static final OperatingSystemMXBean operatingSystemMXBean = ManagementFactory.getOperatingSystemMXBean();
+
+  private final boolean enabled;
+  private final double loadAverageThreshold;
+
+  // Assumption -- the value of these parameters will be set correctly before invoking getDebugInfo()
+  private static final ThreadLocal<Double> seenLoadAverage = ThreadLocal.withInitial(() -> 0.0);
+
+  private static final ThreadLocal<Double> allowedLoadAverage = ThreadLocal.withInitial(() -> 0.0);
+
+  public LoadAverageCircuitBreaker(CircuitBreakerConfig config) {
+    super(config);
+
+    this.enabled = config.getLoadAverageCBEnabled();
+    this.loadAverageThreshold = config.getLoadAverageCBThreshold();
+  }
+
+  @Override
+  public boolean isTripped() {
+    if (!isEnabled()) {
+      return false;
+    }
+
+    if (!enabled) {
+      return false;
+    }
+
+    double localAllowedLoadAverage = getLoadAverageThreshold();
+    double localSeenLoadAverage = calculateLiveLoadAverage();
+
+    if (localSeenLoadAverage < 0) {
+      if (log.isWarnEnabled()) {

Review comment:
       We could also defer further refactoring to a followup JIRA to split things up a bit. There may be other larger refactorings for 9.0 so perhaps there are synergies?

##########
File path: solr/core/src/test/org/apache/solr/util/TestCircuitBreaker.java
##########
@@ -199,7 +201,63 @@ public void testFakeCPUCircuitBreaker() {
       PluginInfo pluginInfo = h.getCore().getSolrConfig().getPluginInfo(CircuitBreakerManager.class.getName());
 
       CircuitBreaker.CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerManager.buildCBConfig(pluginInfo);
-      CircuitBreaker circuitBreaker = new FakeCPUCircuitBreaker(circuitBreakerConfig);
+      CircuitBreaker circuitBreaker = new FakeCPUCircuitBreaker(circuitBreakerConfig, null);

Review comment:
       Ah, I miss Korlin and named non-positional arguments with default values! :) 

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       Can we make `CircuitBreaker` subclasses that need the core implement `SolrCoreAware`? That would be the most elegant instead of having to pass all context down through the manager.
   
   I'd also like to split up things even more to make CBs truly pluggable, and not managed by some common Manager that has to know about all plugins, and a common config class? But that's for another JIRA, and also a breaking change that is better suited for 9.0




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] atris commented on pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
atris commented on pull request #96:
URL: https://github.com/apache/solr/pull/96#issuecomment-853358369


   Hi Christine,
   
   Yes, merging the two circuit breakers makes sense as long as we separate out the configurations distinctly.
   
   Alternative is to have the common code (and test code) in parent classes with both the circuit breakers and their infrastructure deriving from them.
   
   Your point about back compatibility is important. At work, we use the CPU circuit breaker, and I am sure many other users do, too.
   
   Renaming existing functionality will break that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] atris commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r643437744



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CPUCircuitBreaker.java
##########
@@ -110,6 +113,27 @@ public double getCpuUsageThreshold() {
   }
 
   protected double calculateLiveCPUUsage() {
-    return operatingSystemMXBean.getSystemLoadAverage();
+    Metric metric = this.core
+        .getCoreContainer()
+        .getMetricManager()
+        .registry("solr.jvm")
+        .getMetrics()
+        .get("os.systemCpuLoad");

Review comment:
       The load average represents CPU usage and resources waiting for CPU - - so the job queue will contribute to the load average.
   
   AFAIK, load average will always include CPU and CPU dependent tasks, which is a good representation of the load on the system.
   
   Nevertheless, I am not a stickler for the naming as long as the documentation is correct




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] wrunderwood commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
wrunderwood commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r643437430



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/LoadAverageCircuitBreaker.java
##########
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.util.circuitbreaker;
+
+import java.lang.invoke.MethodHandles;
+import java.lang.management.ManagementFactory;
+import java.lang.management.OperatingSystemMXBean;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * <p>
+ * Tracks current system load average and triggers if the specified threshold is breached.
+ *
+ * This circuit breaker gets the load average (length of the run queue) over the last
+ * minute and uses that data to take a decision. We depend on OperatingSystemMXBean which does
+ * not allow a configurable interval of collection of data.
+ * //TODO: Use Codahale Meter to calculate the value locally.
+ * </p>
+ *
+ * <p>
+ * The configuration to define which mode to use and the trigger threshold are defined in
+ * solrconfig.xml
+ * </p>
+ */
+public class LoadAverageCircuitBreaker extends CircuitBreaker {
+  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+  private static final OperatingSystemMXBean operatingSystemMXBean = ManagementFactory.getOperatingSystemMXBean();
+
+  private final boolean enabled;
+  private final double loadAverageThreshold;
+
+  // Assumption -- the value of these parameters will be set correctly before invoking getDebugInfo()
+  private static final ThreadLocal<Double> seenLoadAverage = ThreadLocal.withInitial(() -> 0.0);
+
+  private static final ThreadLocal<Double> allowedLoadAverage = ThreadLocal.withInitial(() -> 0.0);
+
+  public LoadAverageCircuitBreaker(CircuitBreakerConfig config) {
+    super(config);
+
+    this.enabled = config.getLoadAverageCBEnabled();
+    this.loadAverageThreshold = config.getLoadAverageCBThreshold();
+  }
+
+  @Override
+  public boolean isTripped() {
+    if (!isEnabled()) {
+      return false;
+    }
+
+    if (!enabled) {
+      return false;
+    }
+
+    double localAllowedLoadAverage = getLoadAverageThreshold();
+    double localSeenLoadAverage = calculateLiveLoadAverage();
+
+    if (localSeenLoadAverage < 0) {
+      if (log.isWarnEnabled()) {

Review comment:
       We disagree about coding style.
   
   This fixes a bug. The existing CPU circuit breaker really is not usable because the inputs are limited to the wrong values.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] atris commented on pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
atris commented on pull request #96:
URL: https://github.com/apache/solr/pull/96#issuecomment-859671978


   > Thanks for all the suggestions. I'll do some work on this and come back with a new PR.
   > 
   > * Use SolrCoreAware.
   > * Check input threshold values, the only reason this is usable is that they aren't checked.
   
   I am confused by your second point -- I thought your initial suggestion was that the threshold should be unbounded because the load average metric is unbounded (which is the case right now).
   
   So what exactly are you proposing to check, please? I agree that checking that the value is greater than zero, but did you have anything else in mind?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] wrunderwood commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
wrunderwood commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r643424082



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreaker.java
##########
@@ -66,14 +66,19 @@ protected boolean isEnabled() {
     private final int memCBThreshold;
     private final boolean cpuCBEnabled;
     private final int cpuCBThreshold;
+    private final boolean loadAverageCBEnabled;
+    private final double loadAverageCBThreshold;
 
     public CircuitBreakerConfig(final boolean enabled, final boolean memCBEnabled, final int memCBThreshold,
-                                  final boolean cpuCBEnabled, final int cpuCBThreshold) {
+                                final boolean cpuCBEnabled, final int cpuCBThreshold,

Review comment:
       I don't understand this comment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] atris commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r643426496



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreaker.java
##########
@@ -66,14 +66,19 @@ protected boolean isEnabled() {
     private final int memCBThreshold;
     private final boolean cpuCBEnabled;
     private final int cpuCBThreshold;
+    private final boolean loadAverageCBEnabled;
+    private final double loadAverageCBThreshold;
 
     public CircuitBreakerConfig(final boolean enabled, final boolean memCBEnabled, final int memCBThreshold,
-                                  final boolean cpuCBEnabled, final int cpuCBThreshold) {
+                                final boolean cpuCBEnabled, final int cpuCBThreshold,

Review comment:
       Please rename the circuit breaker and it's relevant parameters to reflect the metric that the circuit breaker is using




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] wrunderwood commented on pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
wrunderwood commented on pull request #96:
URL: https://github.com/apache/solr/pull/96#issuecomment-849021275


   The existing CPU circuit breaker has an incorrect name, so yes, it is moved to a more accurate name.
   
   Load average can include things besides CPU. In some systems, it includes threads in iowait. So it should not have “CPU” in the name.
   
   The documentation explains what happens when the CPU circuit breaker is not available.
   
   Yes, the tests are similar because the two circuit breakers take similar inputs.
   
   wunder
   Walter Underwood
   ***@***.***
   http://observer.wunderwood.org/  (my blog)
   
   > On May 26, 2021, at 3:43 AM, Atri Sharma ***@***.***> wrote:
   > 
   > 
   > @atris requested changes on this pull request.
   > 
   > In general, I see some code duplication that should be abstracted. Please see the comments, thanks!
   > 
   > In solr/core/src/java/org/apache/solr/util/circuitbreaker/CPUCircuitBreaker.java <https://github.com/apache/solr/pull/96#discussion_r639600967>:
   > 
   > > @@ -110,6 +113,27 @@ public double getCpuUsageThreshold() {
   >    }
   >  
   >    protected double calculateLiveCPUUsage() {
   > -    return operatingSystemMXBean.getSystemLoadAverage();
   > +    Metric metric = this.core
   > +        .getCoreContainer()
   > +        .getMetricManager()
   > +        .registry("solr.jvm")
   > +        .getMetrics()
   > +        .get("os.systemCpuLoad");
   > I dont quite understand this change -- so you are replacing the default CPU circuit breaker with a new implementation and moving the earlier implementation to a new class?
   > 
   > IMO, both the circuit breakers operate on CPU, using different metrics. Both circuit breakers should be named in a way which represents the fact that they operate on CPU and highlight which metric they depend on.
   > 
   > In solr/core/src/java/org/apache/solr/util/circuitbreaker/CPUCircuitBreaker.java <https://github.com/apache/solr/pull/96#discussion_r639601381>:
   > 
   > > +
   > +    if (metric == null) {
   > +        return -1.0;
   > +    }
   > +
   > +    if (metric instanceof Gauge) {
   > +      @SuppressWarnings({"rawtypes"})
   > +          Gauge gauge = (Gauge) metric;
   > +      // unwrap if needed
   > +      if (gauge instanceof SolrMetricManager.GaugeWrapper) {
   > +        gauge = ((SolrMetricManager.GaugeWrapper) gauge).getGauge();
   > +      }
   > +      return ((Double) gauge.getValue()).doubleValue();
   > +    }
   > +
   > +    return -1.0;                // Unable to unpack metric
   > Please log a warning here -- we should not silently project that the system is not under load
   > 
   > In solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreaker.java <https://github.com/apache/solr/pull/96#discussion_r639601638>:
   > 
   > >  
   >      public CircuitBreakerConfig(final boolean enabled, final boolean memCBEnabled, final int memCBThreshold,
   > -                                  final boolean cpuCBEnabled, final int cpuCBThreshold) {
   > +                                final boolean cpuCBEnabled, final int cpuCBThreshold,
   > Rename to highlight the metric (like you did for loadAverageCB renaming)
   > 
   > In solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java <https://github.com/apache/solr/pull/96#discussion_r639602156>:
   > 
   > > @@ -40,12 +41,15 @@
   >   * solution. There will be a follow up with a SIP for a schema API design.
   >   */
   >  public class CircuitBreakerManager implements PluginInfoInitialized {
   > +  private final SolrCore core;
   > I am wary of adding a SolrCore dependency here since only one circuit breaker really uses it and this pollutes the API a fair bit.
   > 
   > In solr/core/src/java/org/apache/solr/util/circuitbreaker/LoadAverageCircuitBreaker.java <https://github.com/apache/solr/pull/96#discussion_r639603549>:
   > 
   > > +
   > +  @Override
   > +  public boolean isTripped() {
   > +    if (!isEnabled()) {
   > +      return false;
   > +    }
   > +
   > +    if (!enabled) {
   > +      return false;
   > +    }
   > +
   > +    double localAllowedLoadAverage = getLoadAverageThreshold();
   > +    double localSeenLoadAverage = calculateLiveLoadAverage();
   > +
   > +    if (localSeenLoadAverage < 0) {
   > +      if (log.isWarnEnabled()) {
   > There is a fair bit of overlap between the two circuit breakers -- can we move the common code and methods to a super class, and define the contract which both (and future) CPU circuit breakers need to adhere to?
   > 
   > In solr/solr-ref-guide/src/circuit-breakers.adoc <https://github.com/apache/solr/pull/96#discussion_r639604003>:
   > 
   > > +recent load average for the whole system. A "load average" is the number of processes using or waiting for a CPU,
   > +usually averaged over one minute. Some systems include processes waiting on IO in the load average. Check the
   > +documentation for your system and JVM to understand this metric. For more information, see the
   > +https://en.wikipedia.org/wiki/Load_(computing)[Wikipedia page for Load],
   > +
   > +Configuration for load average circuit breaker:
   > +
   > +[source,xml]
   > +----
   > +<str name="loadAverageEnabled">true</str>
   > +----
   > +
   > +Note that this configuration will be overridden by the global circuit breaker flag -- if circuit breakers are disabled, this flag
   > +will not help you.
   > +
   > +The triggering threshold is a floating point number matching load average. This circuit breaker will trip when the
   > Not reviewing these changes -- will depend on Cassandra's review at some point
   > 
   > In solr/core/src/test/org/apache/solr/util/TestCircuitBreaker.java <https://github.com/apache/solr/pull/96#discussion_r639604900>:
   > 
   > > @@ -199,7 +201,63 @@ public void testFakeCPUCircuitBreaker() {
   >        PluginInfo pluginInfo = h.getCore().getSolrConfig().getPluginInfo(CircuitBreakerManager.class.getName());
   >  
   >        CircuitBreaker.CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerManager.buildCBConfig(pluginInfo);
   > -      CircuitBreaker circuitBreaker = new FakeCPUCircuitBreaker(circuitBreakerConfig);
   > +      CircuitBreaker circuitBreaker = new FakeCPUCircuitBreaker(circuitBreakerConfig, null);
   > Please add a comment representing what the null stands for
   > 
   > In solr/core/src/test/org/apache/solr/util/TestCircuitBreaker.java <https://github.com/apache/solr/pull/96#discussion_r639605898>:
   > 
   > > +    } finally {
   > +      if (!executor.isShutdown()) {
   > +        executor.shutdown();
   > +      }
   > +    }
   > +  }
   > +
   > +  public void testFakeLoadAverageCircuitBreaker() {
   > +    AtomicInteger failureCount = new AtomicInteger();
   > +
   > +    ExecutorService executor = ExecutorUtil.newMDCAwareCachedThreadPool(
   > +        new SolrNamedThreadFactory("TestCircuitBreaker"));
   > +    try {
   > +      removeAllExistingCircuitBreakers();
   > +
   > +      PluginInfo pluginInfo = h.getCore().getSolrConfig().getPluginInfo(CircuitBreakerManager.class.getName());
   > Isnt this test doing the exact same thing as the original CPU circuit breaker test, only using a different class?
   > 
   > In solr/core/src/test/org/apache/solr/util/TestCircuitBreaker.java <https://github.com/apache/solr/pull/96#discussion_r639606722>:
   > 
   > >      }
   >  
   >      @Override
   >      protected double calculateLiveCPUUsage() {
   >        return 92; // Return a value large enough to trigger the circuit breaker
   >      }
   >    }
   > +
   > +  private static class FakeLoadAverageCircuitBreaker extends LoadAverageCircuitBreaker {
   > +    public FakeLoadAverageCircuitBreaker(CircuitBreakerConfig config) {
   > Again, in this class not an exact copy of FakeCPUCircuitBreaker?
   > 
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub <https://github.com/apache/solr/pull/96#pullrequestreview-668855220>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACR4IINRJPCTF5S4PNJZEK3TPTGD3ANCNFSM43T5FIZA>.
   > 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] wrunderwood commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
wrunderwood commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r643425617



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       Please explain a different way to do it. The core is needed in order to look up the metric.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] atris edited a comment on pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
atris edited a comment on pull request #96:
URL: https://github.com/apache/solr/pull/96#issuecomment-859671978


   > Thanks for all the suggestions. I'll do some work on this and come back with a new PR.
   > 
   > * Use SolrCoreAware.
   > * Check input threshold values, the only reason this is usable is that they aren't checked.
   
   I am confused by your second point -- I thought your initial suggestion was that the threshold should be unbounded because the load average metric is unbounded (which is the case right now).
   
   So what exactly are you proposing to check, please? I agree that checking that the value is greater than zero is valuable, but did you have anything else in mind?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] atris commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r643429115



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       Extend the manager class? Introdyxe a builder and have an implementation that takes the core?
   
   We cannot increase the API surface for every new circuit breaker - - the manager has to be as minimal as possible 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] wrunderwood commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
wrunderwood commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r643426704



##########
File path: solr/solr-ref-guide/src/circuit-breakers.adoc
##########
@@ -94,16 +114,45 @@ Configuration for CPU utilization based circuit breaker:
 Note that this configuration will be overridden by the global circuit breaker flag -- if circuit breakers are disabled, this flag
 will not help you.
 
-The triggering threshold is defined in units of CPU utilization. The configuration to control this is as below:
+The triggering threshold is defined in percent CPU usage. A value of "0" maps to 0% usage 
+and a value of "100" maps to 100% usage.  This circuit breaker will trip when the CPU usage is
+equal to or greater than 75%:
 
 [source,xml]
 ----
 <str name="cpuThreshold">75</str>
 ----
 
+=== System Load Average Circuit Breaker
+This circuit breaker tracks system load average and triggers if the recent load average exceeds a configurable threshold.
+
+This is tracked with the JMX metric `OperatingSystemMXBean.getSystemLoadAverage()`. That measures the
+recent load average for the whole system. A "load average" is the number of processes using or waiting for a CPU,
+usually averaged over one minute. Some systems include processes waiting on IO in the load average. Check the
+documentation for your system and JVM to understand this metric. For more information, see the
+https://en.wikipedia.org/wiki/Load_(computing)[Wikipedia page for Load],
+
+Configuration for load average circuit breaker:
+
+[source,xml]
+----
+<str name="loadAverageEnabled">true</str>
+----
+
+Note that this configuration will be overridden by the global circuit breaker flag -- if circuit breakers are disabled, this flag
+will not help you.
+
+The triggering threshold is a floating point number matching load average. This circuit breaker will trip when the

Review comment:
       Cassandra already reviewed the changes in March and I included her recommended changes. https://issues.apache.org/jira/browse/SOLR-15056?focusedCommentId=17307856&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17307856




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] cpoerschke commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
cpoerschke commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r650089050



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       > ... to make CBs truly pluggable, ... But that's for another JIRA ...
   
   i've created a placeholder JIRA: https://issues.apache.org/jira/browse/SOLR-15474
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] wrunderwood commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
wrunderwood commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r643435567



##########
File path: solr/core/src/test/org/apache/solr/util/TestCircuitBreaker.java
##########
@@ -199,7 +201,63 @@ public void testFakeCPUCircuitBreaker() {
       PluginInfo pluginInfo = h.getCore().getSolrConfig().getPluginInfo(CircuitBreakerManager.class.getName());
 
       CircuitBreaker.CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerManager.buildCBConfig(pluginInfo);
-      CircuitBreaker circuitBreaker = new FakeCPUCircuitBreaker(circuitBreakerConfig);
+      CircuitBreaker circuitBreaker = new FakeCPUCircuitBreaker(circuitBreakerConfig, null);
+
+      h.getCore().getCircuitBreakerManager().register(circuitBreaker);
+
+      List<Future<?>> futures = new ArrayList<>();
+
+      for (int i = 0; i < 5; i++) {
+        Future<?> future = executor.submit(() -> {
+          try {
+            h.query(req("name:\"john smith\""));
+          } catch (SolrException e) {
+            assertThat(e.getMessage(), containsString("Circuit Breakers tripped"));
+            failureCount.incrementAndGet();
+          } catch (Exception e) {
+            throw new RuntimeException(e.getMessage());
+          }
+        });
+
+        futures.add(future);
+      }
+
+      for  (Future<?> future : futures) {
+        try {
+          future.get();
+        } catch (Exception e) {
+          throw new RuntimeException(e.getMessage());
+        }
+      }
+
+      executor.shutdown();
+      try {
+        executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
+      } catch (InterruptedException e) {
+        Thread.currentThread().interrupt();
+        throw new RuntimeException(e.getMessage());
+      }
+
+      assertEquals("Number of failed queries is not correct",5, failureCount.get());
+    } finally {
+      if (!executor.isShutdown()) {
+        executor.shutdown();
+      }
+    }
+  }
+
+  public void testFakeLoadAverageCircuitBreaker() {
+    AtomicInteger failureCount = new AtomicInteger();
+
+    ExecutorService executor = ExecutorUtil.newMDCAwareCachedThreadPool(
+        new SolrNamedThreadFactory("TestCircuitBreaker"));
+    try {
+      removeAllExistingCircuitBreakers();
+
+      PluginInfo pluginInfo = h.getCore().getSolrConfig().getPluginInfo(CircuitBreakerManager.class.getName());

Review comment:
       Yes, because the original class has been renamed to an accurate name, so this test is renamed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] janhoy commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
janhoy commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r644574952



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       There should be some flag on the Threadpool I think, whether the request is internal (initiated by Solr) or external. Don't recall the details.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] wrunderwood commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
wrunderwood commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r643433720



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CPUCircuitBreaker.java
##########
@@ -110,6 +113,27 @@ public double getCpuUsageThreshold() {
   }
 
   protected double calculateLiveCPUUsage() {
-    return operatingSystemMXBean.getSystemLoadAverage();
+    Metric metric = this.core
+        .getCoreContainer()
+        .getMetricManager()
+        .registry("solr.jvm")
+        .getMetrics()
+        .get("os.systemCpuLoad");

Review comment:
       The existing CPU CB is actually a load average CB. Load average is implented differently in different OSes and is not always purely CPU. Also, load average is no limited to 0-100 as documented.
   
   The existing CB is renamed to an accurate name with accurate documentation.
   
   A new CB is created that does what the original CB was documented to do, based on CPU usage instead of load average. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] wrunderwood commented on pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
wrunderwood commented on pull request #96:
URL: https://github.com/apache/solr/pull/96#issuecomment-859667485






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] janhoy commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
janhoy commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r644375943



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreaker.java
##########
@@ -66,14 +66,19 @@ protected boolean isEnabled() {
     private final int memCBThreshold;
     private final boolean cpuCBEnabled;
     private final int cpuCBThreshold;
+    private final boolean loadAverageCBEnabled;
+    private final double loadAverageCBThreshold;
 
     public CircuitBreakerConfig(final boolean enabled, final boolean memCBEnabled, final int memCBThreshold,
-                                  final boolean cpuCBEnabled, final int cpuCBThreshold) {
+                                final boolean cpuCBEnabled, final int cpuCBThreshold,

Review comment:
       After this change there will be tree CBs available: `memCBEnabled`, `loadAverageCBEnabled` (previously cpuCBEnabled), and `cpuCBEnabled` (the new one). So this should be correct, not?

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CPUCircuitBreaker.java
##########
@@ -110,6 +113,27 @@ public double getCpuUsageThreshold() {
   }
 
   protected double calculateLiveCPUUsage() {
-    return operatingSystemMXBean.getSystemLoadAverage();
+    Metric metric = this.core
+        .getCoreContainer()
+        .getMetricManager()
+        .registry("solr.jvm")
+        .getMetrics()
+        .get("os.systemCpuLoad");
+
+    if (metric == null) {
+        return -1.0;
+    }
+
+    if (metric instanceof Gauge) {
+      @SuppressWarnings({"rawtypes"})
+          Gauge gauge = (Gauge) metric;
+      // unwrap if needed
+      if (gauge instanceof SolrMetricManager.GaugeWrapper) {
+        gauge = ((SolrMetricManager.GaugeWrapper) gauge).getGauge();
+      }
+      return ((Double) gauge.getValue()).doubleValue();
+    }
+
+    return -1.0;                // Unable to unpack metric

Review comment:
       I checked. The caller will look for <0 values and warn-log "Unable to get CPU usage". I don't fancy special return values to signal errors when there could be a dedicated Exception and catch logic though.

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/LoadAverageCircuitBreaker.java
##########
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.util.circuitbreaker;
+
+import java.lang.invoke.MethodHandles;
+import java.lang.management.ManagementFactory;
+import java.lang.management.OperatingSystemMXBean;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * <p>
+ * Tracks current system load average and triggers if the specified threshold is breached.
+ *
+ * This circuit breaker gets the load average (length of the run queue) over the last
+ * minute and uses that data to take a decision. We depend on OperatingSystemMXBean which does
+ * not allow a configurable interval of collection of data.
+ * //TODO: Use Codahale Meter to calculate the value locally.
+ * </p>
+ *
+ * <p>
+ * The configuration to define which mode to use and the trigger threshold are defined in
+ * solrconfig.xml
+ * </p>
+ */
+public class LoadAverageCircuitBreaker extends CircuitBreaker {
+  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+  private static final OperatingSystemMXBean operatingSystemMXBean = ManagementFactory.getOperatingSystemMXBean();
+
+  private final boolean enabled;
+  private final double loadAverageThreshold;
+
+  // Assumption -- the value of these parameters will be set correctly before invoking getDebugInfo()
+  private static final ThreadLocal<Double> seenLoadAverage = ThreadLocal.withInitial(() -> 0.0);
+
+  private static final ThreadLocal<Double> allowedLoadAverage = ThreadLocal.withInitial(() -> 0.0);
+
+  public LoadAverageCircuitBreaker(CircuitBreakerConfig config) {
+    super(config);
+
+    this.enabled = config.getLoadAverageCBEnabled();
+    this.loadAverageThreshold = config.getLoadAverageCBThreshold();
+  }
+
+  @Override
+  public boolean isTripped() {
+    if (!isEnabled()) {
+      return false;
+    }
+
+    if (!enabled) {
+      return false;
+    }
+
+    double localAllowedLoadAverage = getLoadAverageThreshold();
+    double localSeenLoadAverage = calculateLiveLoadAverage();
+
+    if (localSeenLoadAverage < 0) {
+      if (log.isWarnEnabled()) {

Review comment:
       We could also defer further refactoring to a followup JIRA to split things up a bit. There may be other larger refactorings for 9.0 so perhaps there are synergies?

##########
File path: solr/core/src/test/org/apache/solr/util/TestCircuitBreaker.java
##########
@@ -199,7 +201,63 @@ public void testFakeCPUCircuitBreaker() {
       PluginInfo pluginInfo = h.getCore().getSolrConfig().getPluginInfo(CircuitBreakerManager.class.getName());
 
       CircuitBreaker.CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerManager.buildCBConfig(pluginInfo);
-      CircuitBreaker circuitBreaker = new FakeCPUCircuitBreaker(circuitBreakerConfig);
+      CircuitBreaker circuitBreaker = new FakeCPUCircuitBreaker(circuitBreakerConfig, null);

Review comment:
       Ah, I miss Korlin and named non-positional arguments with default values! :) 

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       Can we make `CircuitBreaker` subclasses that need the core implement `SolrCoreAware`? That would be the most elegant instead of having to pass all context down through the manager.
   
   I'd also like to split up things even more to make CBs truly pluggable, and not managed by some common Manager that has to know about all plugins, and a common config class? But that's for another JIRA, and also a breaking change that is better suited for 9.0

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       There should be some flag on the Threadpool I think, whether the request is internal (initiated by Solr) or external. Don't recall the details.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] atris commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r643429971



##########
File path: solr/solr-ref-guide/src/circuit-breakers.adoc
##########
@@ -94,16 +114,45 @@ Configuration for CPU utilization based circuit breaker:
 Note that this configuration will be overridden by the global circuit breaker flag -- if circuit breakers are disabled, this flag
 will not help you.
 
-The triggering threshold is defined in units of CPU utilization. The configuration to control this is as below:
+The triggering threshold is defined in percent CPU usage. A value of "0" maps to 0% usage 
+and a value of "100" maps to 100% usage.  This circuit breaker will trip when the CPU usage is
+equal to or greater than 75%:
 
 [source,xml]
 ----
 <str name="cpuThreshold">75</str>
 ----
 
+=== System Load Average Circuit Breaker
+This circuit breaker tracks system load average and triggers if the recent load average exceeds a configurable threshold.
+
+This is tracked with the JMX metric `OperatingSystemMXBean.getSystemLoadAverage()`. That measures the
+recent load average for the whole system. A "load average" is the number of processes using or waiting for a CPU,
+usually averaged over one minute. Some systems include processes waiting on IO in the load average. Check the
+documentation for your system and JVM to understand this metric. For more information, see the
+https://en.wikipedia.org/wiki/Load_(computing)[Wikipedia page for Load],
+
+Configuration for load average circuit breaker:
+
+[source,xml]
+----
+<str name="loadAverageEnabled">true</str>
+----
+
+Note that this configuration will be overridden by the global circuit breaker flag -- if circuit breakers are disabled, this flag
+will not help you.
+
+The triggering threshold is a floating point number matching load average. This circuit breaker will trip when the

Review comment:
       Cool, then this stands resolved




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] wrunderwood commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
wrunderwood commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r643432049



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/LoadAverageCircuitBreaker.java
##########
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.util.circuitbreaker;
+
+import java.lang.invoke.MethodHandles;
+import java.lang.management.ManagementFactory;
+import java.lang.management.OperatingSystemMXBean;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * <p>
+ * Tracks current system load average and triggers if the specified threshold is breached.
+ *
+ * This circuit breaker gets the load average (length of the run queue) over the last
+ * minute and uses that data to take a decision. We depend on OperatingSystemMXBean which does
+ * not allow a configurable interval of collection of data.
+ * //TODO: Use Codahale Meter to calculate the value locally.
+ * </p>
+ *
+ * <p>
+ * The configuration to define which mode to use and the trigger threshold are defined in
+ * solrconfig.xml
+ * </p>
+ */
+public class LoadAverageCircuitBreaker extends CircuitBreaker {
+  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+  private static final OperatingSystemMXBean operatingSystemMXBean = ManagementFactory.getOperatingSystemMXBean();
+
+  private final boolean enabled;
+  private final double loadAverageThreshold;
+
+  // Assumption -- the value of these parameters will be set correctly before invoking getDebugInfo()
+  private static final ThreadLocal<Double> seenLoadAverage = ThreadLocal.withInitial(() -> 0.0);
+
+  private static final ThreadLocal<Double> allowedLoadAverage = ThreadLocal.withInitial(() -> 0.0);
+
+  public LoadAverageCircuitBreaker(CircuitBreakerConfig config) {
+    super(config);
+
+    this.enabled = config.getLoadAverageCBEnabled();
+    this.loadAverageThreshold = config.getLoadAverageCBThreshold();
+  }
+
+  @Override
+  public boolean isTripped() {
+    if (!isEnabled()) {
+      return false;
+    }
+
+    if (!enabled) {
+      return false;
+    }
+
+    double localAllowedLoadAverage = getLoadAverageThreshold();
+    double localSeenLoadAverage = calculateLiveLoadAverage();
+
+    if (localSeenLoadAverage < 0) {
+      if (log.isWarnEnabled()) {

Review comment:
       An extra base class only for CPU actually makes the code base more complicated, not less. The other circuit breaker is a memory CB, so this cannot be shared with those.
   
   I can't think of another CPU breaker, so let's follow YAGNI and avoid introducing extra base classes "just in case".
   
   Some quick line counts show that the two files have 115 and 139 lines. A quick diff | wc shows 86 differences between the two files. So there isn't as much overlap as it might appear.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] atris commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r643439528



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/LoadAverageCircuitBreaker.java
##########
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.util.circuitbreaker;
+
+import java.lang.invoke.MethodHandles;
+import java.lang.management.ManagementFactory;
+import java.lang.management.OperatingSystemMXBean;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * <p>
+ * Tracks current system load average and triggers if the specified threshold is breached.
+ *
+ * This circuit breaker gets the load average (length of the run queue) over the last
+ * minute and uses that data to take a decision. We depend on OperatingSystemMXBean which does
+ * not allow a configurable interval of collection of data.
+ * //TODO: Use Codahale Meter to calculate the value locally.
+ * </p>
+ *
+ * <p>
+ * The configuration to define which mode to use and the trigger threshold are defined in
+ * solrconfig.xml
+ * </p>
+ */
+public class LoadAverageCircuitBreaker extends CircuitBreaker {
+  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+  private static final OperatingSystemMXBean operatingSystemMXBean = ManagementFactory.getOperatingSystemMXBean();
+
+  private final boolean enabled;
+  private final double loadAverageThreshold;
+
+  // Assumption -- the value of these parameters will be set correctly before invoking getDebugInfo()
+  private static final ThreadLocal<Double> seenLoadAverage = ThreadLocal.withInitial(() -> 0.0);
+
+  private static final ThreadLocal<Double> allowedLoadAverage = ThreadLocal.withInitial(() -> 0.0);
+
+  public LoadAverageCircuitBreaker(CircuitBreakerConfig config) {
+    super(config);
+
+    this.enabled = config.getLoadAverageCBEnabled();
+    this.loadAverageThreshold = config.getLoadAverageCBThreshold();
+  }
+
+  @Override
+  public boolean isTripped() {
+    if (!isEnabled()) {
+      return false;
+    }
+
+    if (!enabled) {
+      return false;
+    }
+
+    double localAllowedLoadAverage = getLoadAverageThreshold();
+    double localSeenLoadAverage = calculateLiveLoadAverage();
+
+    if (localSeenLoadAverage < 0) {
+      if (log.isWarnEnabled()) {

Review comment:
       I don't understand your comment.
   
    Regardless of how we interpret the intention of the two circuit breakers, the fact remains that they have significant API and code overlap, which goes against any coding guideline




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] atris commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r643434811



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/LoadAverageCircuitBreaker.java
##########
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.util.circuitbreaker;
+
+import java.lang.invoke.MethodHandles;
+import java.lang.management.ManagementFactory;
+import java.lang.management.OperatingSystemMXBean;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * <p>
+ * Tracks current system load average and triggers if the specified threshold is breached.
+ *
+ * This circuit breaker gets the load average (length of the run queue) over the last
+ * minute and uses that data to take a decision. We depend on OperatingSystemMXBean which does
+ * not allow a configurable interval of collection of data.
+ * //TODO: Use Codahale Meter to calculate the value locally.
+ * </p>
+ *
+ * <p>
+ * The configuration to define which mode to use and the trigger threshold are defined in
+ * solrconfig.xml
+ * </p>
+ */
+public class LoadAverageCircuitBreaker extends CircuitBreaker {
+  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+  private static final OperatingSystemMXBean operatingSystemMXBean = ManagementFactory.getOperatingSystemMXBean();
+
+  private final boolean enabled;
+  private final double loadAverageThreshold;
+
+  // Assumption -- the value of these parameters will be set correctly before invoking getDebugInfo()
+  private static final ThreadLocal<Double> seenLoadAverage = ThreadLocal.withInitial(() -> 0.0);
+
+  private static final ThreadLocal<Double> allowedLoadAverage = ThreadLocal.withInitial(() -> 0.0);
+
+  public LoadAverageCircuitBreaker(CircuitBreakerConfig config) {
+    super(config);
+
+    this.enabled = config.getLoadAverageCBEnabled();
+    this.loadAverageThreshold = config.getLoadAverageCBThreshold();
+  }
+
+  @Override
+  public boolean isTripped() {
+    if (!isEnabled()) {
+      return false;
+    }
+
+    if (!enabled) {
+      return false;
+    }
+
+    double localAllowedLoadAverage = getLoadAverageThreshold();
+    double localSeenLoadAverage = calculateLiveLoadAverage();
+
+    if (localSeenLoadAverage < 0) {
+      if (log.isWarnEnabled()) {

Review comment:
       I respect fully disagree.
   
   The files have common methods names - which make perfect candidates for a common interface, for eg.
   
   Also, the tests are almost a copy paste of the existing.
   
   This is not a just in case - two concrete circuit breakers share significant code. This can be improved, so should be




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] wrunderwood commented on pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
wrunderwood commented on pull request #96:
URL: https://github.com/apache/solr/pull/96#issuecomment-853452250


   I apologize for the personal comment.
   
   The documentation and name describe a CPU utilization circuit breaker. The implementation is a load average metric. Load average is not a CPU utilization metric. The limits on the circuit breaker don’t match the range of the metric for load average.
   
   The implementation makes an incorrect assumption. That is a bug. It is actually a double bug. The implementation does not match the documentation, but if you figure out the implementation, the limit on thresholds makes it unusable on multi-CPU systems.
   
   wunder
   Walter Underwood
   ***@***.***
   http://observer.wunderwood.org/  (my blog)
   
   > On Jun 2, 2021, at 4:15 PM, Jan Høydahl ***@***.***> wrote:
   > 
   > 
   > Whoever implemented this did not understand how the load average metric works.
   > 
   > Please keep the discussion factual and about the code, not the people.
   > 
   > I think the chosen path of having separate LoadAvg and CpuPct implementations is better than some extra flag to choose. This keeps the configuration more direct and clear, and the documentation of cpuThreshold will not be conditional on another parameter - the two impls can be documented separate from each other.
   > 
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub <https://github.com/apache/solr/pull/96#issuecomment-853442218>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACR4IILGJTAHOMGVXL6PA7DTQ23PXANCNFSM43T5FIZA>.
   > 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] wrunderwood commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
wrunderwood commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r644391283



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       I was unaware of SolrCoreAware, I'll go check it out. Thanks.
   
   The other larger change is to have circuit breakers only interrupt external requests and to not interrupt internal requests to shards. The current approach multiples failures. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] wrunderwood commented on pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
wrunderwood commented on pull request #96:
URL: https://github.com/apache/solr/pull/96#issuecomment-853410595






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] atris commented on pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
atris commented on pull request #96:
URL: https://github.com/apache/solr/pull/96#issuecomment-852387820


   > The existing CPU circuit breaker has an incorrect name, so yes, it is moved to a more accurate name.
   > 
   > Load average can include things besides CPU. In some systems, it includes threads in iowait. So it should not have “CPU” in the name.
   > 
   > The documentation explains what happens when the CPU circuit breaker is not available.
   > 
   > Yes, the tests are similar because the two circuit breakers take similar inputs.
   
   
   Thank you for clarifying.
   
   My stand still remains the same -- it is a good practice to explicitly list what metrics the CPU circuit breaker is using.
   
   For the load average, I agree that it can include auxiliary inputs but always contains CPU as a factor, hence the CPU in name suggestion. It is not a blocker though.
   
   In general, I have two objections:
   
   1. API changes for the new circuit breaker -- this is causing unnecessary noise across a lot of files.
   
   2. Duplication of code -- a significant amount of code is duplicated between load average circuit breaker and the new circuit breaker.
   
   Please address these issues and iterate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] atris commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r644474656



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       Thanks for the input, @janhoy . This is elegant and a clean solution.
   
   RE: Not having a common manager, a common manager is needed so that SolrCore has to deal. with only one access point for all CBs. However, I agree that CircuitBreakerManager should not need to be explicitly aware of each CB implementation, will fix that for 9.0
   
   Also, I am confused about your comment of the common config class -- there is a common config class already for all circuit breakers, and the configuration is completely pluggable since it implements the PluginInfo pattern?
   
   
   
   

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/LoadAverageCircuitBreaker.java
##########
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.util.circuitbreaker;
+
+import java.lang.invoke.MethodHandles;
+import java.lang.management.ManagementFactory;
+import java.lang.management.OperatingSystemMXBean;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * <p>
+ * Tracks current system load average and triggers if the specified threshold is breached.
+ *
+ * This circuit breaker gets the load average (length of the run queue) over the last
+ * minute and uses that data to take a decision. We depend on OperatingSystemMXBean which does
+ * not allow a configurable interval of collection of data.
+ * //TODO: Use Codahale Meter to calculate the value locally.
+ * </p>
+ *
+ * <p>
+ * The configuration to define which mode to use and the trigger threshold are defined in
+ * solrconfig.xml
+ * </p>
+ */
+public class LoadAverageCircuitBreaker extends CircuitBreaker {
+  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+  private static final OperatingSystemMXBean operatingSystemMXBean = ManagementFactory.getOperatingSystemMXBean();
+
+  private final boolean enabled;
+  private final double loadAverageThreshold;
+
+  // Assumption -- the value of these parameters will be set correctly before invoking getDebugInfo()
+  private static final ThreadLocal<Double> seenLoadAverage = ThreadLocal.withInitial(() -> 0.0);
+
+  private static final ThreadLocal<Double> allowedLoadAverage = ThreadLocal.withInitial(() -> 0.0);
+
+  public LoadAverageCircuitBreaker(CircuitBreakerConfig config) {
+    super(config);
+
+    this.enabled = config.getLoadAverageCBEnabled();
+    this.loadAverageThreshold = config.getLoadAverageCBThreshold();
+  }
+
+  @Override
+  public boolean isTripped() {
+    if (!isEnabled()) {
+      return false;
+    }
+
+    if (!enabled) {
+      return false;
+    }
+
+    double localAllowedLoadAverage = getLoadAverageThreshold();
+    double localSeenLoadAverage = calculateLiveLoadAverage();
+
+    if (localSeenLoadAverage < 0) {
+      if (log.isWarnEnabled()) {

Review comment:
       @janhoy I am sorry, but the refactorings proposed do not really have anything to do with the actual circuit breakers themselves.
   
   The two circuit breakers (load average and proposed CPU CB) share enough method signatures to warrant common interfaces, and their tests are alike apart from class names. This is a bad code smell enough to be fixed, and I do not see why it cannot be done in this PR itself.

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CPUCircuitBreaker.java
##########
@@ -110,6 +113,27 @@ public double getCpuUsageThreshold() {
   }
 
   protected double calculateLiveCPUUsage() {
-    return operatingSystemMXBean.getSystemLoadAverage();
+    Metric metric = this.core
+        .getCoreContainer()
+        .getMetricManager()
+        .registry("solr.jvm")
+        .getMetrics()
+        .get("os.systemCpuLoad");

Review comment:
       Yes, so here is the idea:
   
   Typical Solr clusters are dedicated node models (Solr is not expected to share resources with any other process). To my knowledge, most IO bound tasks in Solr also have a CPU impact so the possibility of a Solr operation having such a massive IO wait that it triggers the CB *without* having any significant impact on CPU is low (unless you are running Solr on USB storage drives).
   
   Please remember that the circuit breaker is designed to *only* deal with typical Solr load patterns, hence is not targeted for generic situations. Unless misconfigured, Solr will use multiple CPUs if they are available, hence I do not understand your point.
   
   Even in the scenario you described, you can simply set the trigger threshold of the CB higher so that it will trigger only when CPU usage is also high. 
   
   Let's say you set the threshold to 200. You will need more than 150 IO waiting processes *and* moderate CPU usage to trigger the threshold.
   
   Also, load average is computed by contributions of three factors -- CPU usage, tasks waiting for CPU and (in some implementations) IO blocked tasks. If you set the triggering threshold high enough, there is a very slim possibility that the IO blocked processes metric can overwhelm the others and cause the CB to trigger.
   
   But we digress. Solr is an open source project. Circuit breakers are designed to be extendable. If you need a circuit breaker with a different metric, by all means, please implement the same.
   
   Let us now focus on your complaints against the existing circuit breaker:
   
   1. Documentation not reflecting that in the platforms that choose to include IO waiting tasks as a factor in load average, CPU CB will implicitly include the same in its triggering threshold. I agree this should be included and I thank you for fixing this.
   
   2. Existing CB does not work because it limits the threshold that can be set between 0 and 1 -- Where did that come from?
   
   Solr's circuit breaker documentation (https://solr.apache.org/guide/8_7/circuit-breakers.html) does not state anything of this sort for the CPU circuit breaker. Nor does the code implement anything of this sort. The triggering threshold is unbounded, because the load average metric is unbounded. At work, we use a threshold in multiples of 100 and it works perfectly well for us.
   
   So, where is this claim coming from? Where exactly have you seen the limits being enforced?
   
   Your entire premise that the existing code does not work is based on this fulcrum of the limits, which do not exist in the code or documentation. You have been perpetually defaming the feature and have even gone to the extent of "warning" our users to immediately stop using it. My question is -- what is the basis of the claim?

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       @wrunderwood I have highlighted this to yourself before months back when you asked the same question, and I will repeat it again -- Solr 8x has no way of distinguishing between internal and external requests hence there is no way for circuit breakers to know what kind of requests they are seeing. I implemented this functionality in SOLR-13528 and will be doing the request level disambiguation in 9.x but there is still no way to identify the same in 8.x

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreaker.java
##########
@@ -66,14 +66,19 @@ protected boolean isEnabled() {
     private final int memCBThreshold;
     private final boolean cpuCBEnabled;
     private final int cpuCBThreshold;
+    private final boolean loadAverageCBEnabled;
+    private final double loadAverageCBThreshold;
 
     public CircuitBreakerConfig(final boolean enabled, final boolean memCBEnabled, final int memCBThreshold,
-                                  final boolean cpuCBEnabled, final int cpuCBThreshold) {
+                                final boolean cpuCBEnabled, final int cpuCBThreshold,

Review comment:
       I was advocating including the name of the metric being used for the CPU enabled CB (and not call it cpuCBEnabled)

##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       There is now, post SOLR-13528, in 9x. There was none in 8x, hence I had to add that mechanism.

##########
File path: solr/core/src/test/org/apache/solr/util/TestCircuitBreaker.java
##########
@@ -199,7 +201,63 @@ public void testFakeCPUCircuitBreaker() {
       PluginInfo pluginInfo = h.getCore().getSolrConfig().getPluginInfo(CircuitBreakerManager.class.getName());
 
       CircuitBreaker.CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerManager.buildCBConfig(pluginInfo);
-      CircuitBreaker circuitBreaker = new FakeCPUCircuitBreaker(circuitBreakerConfig);
+      CircuitBreaker circuitBreaker = new FakeCPUCircuitBreaker(circuitBreakerConfig, null);
+
+      h.getCore().getCircuitBreakerManager().register(circuitBreaker);
+
+      List<Future<?>> futures = new ArrayList<>();
+
+      for (int i = 0; i < 5; i++) {
+        Future<?> future = executor.submit(() -> {
+          try {
+            h.query(req("name:\"john smith\""));
+          } catch (SolrException e) {
+            assertThat(e.getMessage(), containsString("Circuit Breakers tripped"));
+            failureCount.incrementAndGet();
+          } catch (Exception e) {
+            throw new RuntimeException(e.getMessage());
+          }
+        });
+
+        futures.add(future);
+      }
+
+      for  (Future<?> future : futures) {
+        try {
+          future.get();
+        } catch (Exception e) {
+          throw new RuntimeException(e.getMessage());
+        }
+      }
+
+      executor.shutdown();
+      try {
+        executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
+      } catch (InterruptedException e) {
+        Thread.currentThread().interrupt();
+        throw new RuntimeException(e.getMessage());
+      }
+
+      assertEquals("Number of failed queries is not correct",5, failureCount.get());
+    } finally {
+      if (!executor.isShutdown()) {
+        executor.shutdown();
+      }
+    }
+  }
+
+  public void testFakeLoadAverageCircuitBreaker() {
+    AtomicInteger failureCount = new AtomicInteger();
+
+    ExecutorService executor = ExecutorUtil.newMDCAwareCachedThreadPool(
+        new SolrNamedThreadFactory("TestCircuitBreaker"));
+    try {
+      removeAllExistingCircuitBreakers();
+
+      PluginInfo pluginInfo = h.getCore().getSolrConfig().getPluginInfo(CircuitBreakerManager.class.getName());

Review comment:
       That does not answer my question -- they are still exact copies of each other. Lets abstract to a parent class?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] cpoerschke commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
cpoerschke commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r645380954



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       > There should be some flag on the Threadpool I think, whether the request is internal (initiated by Solr) or external. Don't recall the details.
   
   Via https://github.com/apache/solr/pull/112 I learnt about https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.8.2/solr/solrj/src/java/org/apache/solr/common/util/ExecutorUtil.java#L247-L256 i.e. `ExecutorUtil.(setServerThreadFlag|isSolrServerThread)`.
   
   That may or may not be the flag you meant.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] wrunderwood commented on pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
wrunderwood commented on pull request #96:
URL: https://github.com/apache/solr/pull/96#issuecomment-853410595


   Anyone running this on a multi-CPU machine should disable it immediately.
   
   The documentation says it will reject traffic when total CPU usage reaches a limit, but it will really reject traffic when a single CPU reaches that limit or when one process is blocked on IO. The remaining CPUs will be idle.
   
   Whoever implemented this did not understand how the load average metric works.
   
   The current behavior is a bug and should not be used in production. Solr is not required to keep back compatibility with bugs.
   
   wunder
   Walter Underwood
   ***@***.***
   http://observer.wunderwood.org/  (my blog)
   
   > On Jun 2, 2021, at 1:23 PM, Atri Sharma ***@***.***> wrote:
   > 
   > 
   > Hi Christine,
   > 
   > Yes, merging the two circuit breakers makes sense as long as we separate out the configurations distinctly.
   > 
   > Alternative is to have the common code (and test code) in parent classes with both the circuit breakers and their infrastructure deriving from them.
   > 
   > Your point about back compatibility is important. At work, we use the CPU circuit breaker, and I am sure many other users do, too.
   > 
   > Renaming existing functionality will break that.
   > 
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub <https://github.com/apache/solr/pull/96#issuecomment-853358369>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACR4IIIMOEP5VQYUW5VEE6LTQ2HMRANCNFSM43T5FIZA>.
   > 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] janhoy commented on pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
janhoy commented on pull request #96:
URL: https://github.com/apache/solr/pull/96#issuecomment-853442218


   > Whoever implemented this did not understand how the load average metric works.
   
   Please keep the discussion factual and about the code, not the people.
   
   I think the chosen path of having separate LoadAvg and CpuPct implementations is better than some extra flag to choose. This keeps the configuration more direct and clear, and the documentation of `cpuThreshold` will not be conditional on another parameter - the two impls can be documented separate from each other.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] atris commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r650081537



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       For Solr 9, I am going to use the new flags introduced in SOLR-13528, which should be a cleaner way of achieving the goal (unless there is a problem I am missing, please?)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] wrunderwood commented on a change in pull request #96: SOLR-15056: add circuit breaker for CPU, fix load circuit breaker

Posted by GitBox <gi...@apache.org>.
wrunderwood commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r643443328



##########
File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CPUCircuitBreaker.java
##########
@@ -110,6 +113,27 @@ public double getCpuUsageThreshold() {
   }
 
   protected double calculateLiveCPUUsage() {
-    return operatingSystemMXBean.getSystemLoadAverage();
+    Metric metric = this.core
+        .getCoreContainer()
+        .getMetricManager()
+        .registry("solr.jvm")
+        .getMetrics()
+        .get("os.systemCpuLoad");

Review comment:
       The documentation and the linked Wikipedia page give more information on load average.
   
   Load average is a "system" load average, not just CPU. From Wikipedia:
   
   "An idle computer has a load number of 0 (the idle process isn't counted). Each process using or waiting for CPU (the ready queue or run queue) increments the load number by 1. Each process that terminates decrements it by 1. Most UNIX systems count only processes in the running (on CPU) or runnable (waiting for CPU) states. However, Linux also includes processes in uninterruptible sleep states (usually waiting for disk activity), which can lead to markedly different results if many processes remain blocked in I/O due to a busy or stalled I/O system."




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org