You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "mlbiscoc (via GitHub)" <gi...@apache.org> on 2023/05/05 17:32:17 UTC

[GitHub] [solr] mlbiscoc opened a new pull request, #1627: SOLR-16273: Prometheus Metric Exporter is very slow when collecting large amounts of sample data

mlbiscoc opened a new pull request, #1627:
URL: https://github.com/apache/solr/pull/1627

   https://issues.apache.org/jira/browse/SOLR-16273
   
   <!--
   _(If you are a project committer then you may remove some/all of the following template.)_
   
   Before creating a pull request, please file an issue in the ASF Jira system for Solr:
   
   * https://issues.apache.org/jira/projects/SOLR
   
   For something minor (i.e. that wouldn't be worth putting in release notes), you can skip JIRA. 
   To create a Jira issue, you will need to create an account there first.
   
   The title of the PR should reference the Jira issue number in the form:
   
   * SOLR-####: <short description of problem or changes>
   
   SOLR must be fully capitalized. A short description helps people scanning pull requests for items they can work on.
   
   Properly referencing the issue in the title ensures that Jira is correctly updated with code review comments and commits. -->
   
   
   # Description
   
   MetricFamilySamples from prometheus contains a List of samples and a "contains" check is called to avoid added duplicate metrics. With large lists, this reduces the prometheus exporters performance with large amounts of collections or lots of metrics scraping
   
   # Solution
   
   Add in a Hashset into MetricSamples.java as a cache and cache the whole metric as a string and check against the cache instead of checking against the List with contains.
   
   # Tests
   
   Update unit tests where cache is implemented and verify no duplicate samples were added to a metric family.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request title.
   - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `main` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Reference Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] mlbiscoc commented on pull request #1627: SOLR-16273: Prometheus Metric Exporter is very slow when collecting large amounts of sample data

Posted by "mlbiscoc (via GitHub)" <gi...@apache.org>.
mlbiscoc commented on PR #1627:
URL: https://github.com/apache/solr/pull/1627#issuecomment-1577424719

   @dsmiley Thanks for reviewing with suggestions. I made the changes accordingly.
   Only part I did not change was for adding a higher level test.
   
   I also thought about doing this earlier in development but getting duplicate metrics is caused from more than 1 collection. The current configuration for CloudScraperTest is using 1 collection so I would need to refactor all the tests with another collection and anything using that configuration so I felt that a unit test to catch the duplication was suffice and easiest to test for without potentially breaking tests.
   
   I know you said it is not critical but if you feel strong on adding a high level test to catch duplication, I can go forward with that doing that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] mlbiscoc commented on pull request #1627: SOLR-16273: Prometheus Metric Exporter is very slow when collecting large amounts of sample data

Posted by "mlbiscoc (via GitHub)" <gi...@apache.org>.
mlbiscoc commented on PR #1627:
URL: https://github.com/apache/solr/pull/1627#issuecomment-1569123316

   @dsmiley I noticed you've done some work with the prometheus exporter. Would you have any input for these changes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] mlbiscoc commented on pull request #1627: SOLR-16273: Prometheus Metric Exporter is very slow when collecting large amounts of sample data

Posted by "mlbiscoc (via GitHub)" <gi...@apache.org>.
mlbiscoc commented on PR #1627:
URL: https://github.com/apache/solr/pull/1627#issuecomment-1554557917

   Hello, is there any other additional comments or suggestions to review for improving this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a diff in pull request #1627: SOLR-16273: Prometheus Metric Exporter is very slow when collecting large amounts of sample data

Posted by "dsmiley (via GitHub)" <gi...@apache.org>.
dsmiley commented on code in PR #1627:
URL: https://github.com/apache/solr/pull/1627#discussion_r1211048052


##########
solr/prometheus-exporter/src/java/org/apache/solr/prometheus/collector/MetricSamples.java:
##########
@@ -19,24 +19,42 @@
 
 import io.prometheus.client.Collector;
 import java.util.HashMap;
+import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
+import java.util.Set;
 import java.util.stream.Collectors;
 
 public class MetricSamples {
 
   private final Map<String, Collector.MetricFamilySamples> samplesByMetricName;
 
+  private final Set<String> sampleMetricsCache;
+
   public MetricSamples(Map<String, Collector.MetricFamilySamples> input) {
     samplesByMetricName = input;
+    sampleMetricsCache = new HashSet<>();
+    for (Collector.MetricFamilySamples metricFamilySamples : input.values()) {
+      addSamplesToCache(metricFamilySamples);
+    }
   }
 
   public MetricSamples() {
     this(new HashMap<>());
   }
 
-  public void addSamplesIfNotPresent(String metricName, Collector.MetricFamilySamples samples) {
-    samplesByMetricName.putIfAbsent(metricName, samples);
+  private void addSamplesToCache(Collector.MetricFamilySamples metricFamilySamples) {

Review Comment:
   The diff was confusing to read here.  I think it would be clearer if you move this method to *below* the method that calls it; not before.



##########
solr/prometheus-exporter/src/java/org/apache/solr/prometheus/collector/MetricSamples.java:
##########
@@ -19,24 +19,42 @@
 
 import io.prometheus.client.Collector;
 import java.util.HashMap;
+import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
+import java.util.Set;
 import java.util.stream.Collectors;
 
 public class MetricSamples {
 
   private final Map<String, Collector.MetricFamilySamples> samplesByMetricName;
 
+  private final Set<String> sampleMetricsCache;
+
   public MetricSamples(Map<String, Collector.MetricFamilySamples> input) {
     samplesByMetricName = input;
+    sampleMetricsCache = new HashSet<>();
+    for (Collector.MetricFamilySamples metricFamilySamples : input.values()) {
+      addSamplesToCache(metricFamilySamples);
+    }
   }
 
   public MetricSamples() {
     this(new HashMap<>());
   }
 
-  public void addSamplesIfNotPresent(String metricName, Collector.MetricFamilySamples samples) {
-    samplesByMetricName.putIfAbsent(metricName, samples);
+  private void addSamplesToCache(Collector.MetricFamilySamples metricFamilySamples) {
+    for (Collector.MetricFamilySamples.Sample sample : metricFamilySamples.samples) {
+      sampleMetricsCache.add(sample.toString());
+    }
+  }
+
+  public void addSamplesIfNotPresent(
+      String metricName, Collector.MetricFamilySamples metricFamilySamples) {
+    if (!samplesByMetricName.containsKey(metricName)) {
+      samplesByMetricName.put(metricName, metricFamilySamples);

Review Comment:
   There used to be a putIfAbsent call here, which is more elegant (and efficient) than a check and put.  You can still use that; just rely on the return value of putIfAbsent being null to check if you then should call addSamplesToCache.  If you think it's still unclear, you could add a comment.



##########
solr/prometheus-exporter/src/java/org/apache/solr/prometheus/collector/MetricSamples.java:
##########
@@ -47,7 +65,8 @@ public void addSampleIfMetricExists(
       return;
     }
 
-    if (!sampleFamily.samples.contains(sample)) {
+    if (!sampleMetricsCache.contains(sample.toString())) {
+      sampleMetricsCache.add(sample.toString());

Review Comment:
   Again, you could use a sampleMetricCache.add(sample.toString()) and look at the response boolean to know if it was added (wasn't already present).  Is more efficient and moreover avoids a second sample.toString() call easily.
   
   Why is sampleMetricsCache using the toString of the Sample at all; why isn't it simply `Set<Sample>`?  Also; sampleMetricsCache isn't a cache; it's not the ideal name/word for it.  I would call it "seenSamples".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] mlbiscoc commented on pull request #1627: SOLR-16273: Prometheus Metric Exporter is very slow when collecting large amounts of sample data

Posted by "mlbiscoc (via GitHub)" <gi...@apache.org>.
mlbiscoc commented on PR #1627:
URL: https://github.com/apache/solr/pull/1627#issuecomment-1544685754

   Hi everyone, this should be ready for review now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] mlbiscoc commented on a diff in pull request #1627: SOLR-16273: Prometheus Metric Exporter is very slow when collecting large amounts of sample data

Posted by "mlbiscoc (via GitHub)" <gi...@apache.org>.
mlbiscoc commented on code in PR #1627:
URL: https://github.com/apache/solr/pull/1627#discussion_r1192417543


##########
solr/prometheus-exporter/src/java/org/apache/solr/prometheus/collector/MetricSamples.java:
##########
@@ -19,24 +19,42 @@
 
 import io.prometheus.client.Collector;
 import java.util.HashMap;
+import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
+import java.util.Set;
 import java.util.stream.Collectors;
 
 public class MetricSamples {
 
   private final Map<String, Collector.MetricFamilySamples> samplesByMetricName;
 
+  private Set<String> sampleMetricsCache;

Review Comment:
   Changed into final. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a diff in pull request #1627: SOLR-16273: Prometheus Metric Exporter is very slow when collecting large amounts of sample data

Posted by "dsmiley (via GitHub)" <gi...@apache.org>.
dsmiley commented on code in PR #1627:
URL: https://github.com/apache/solr/pull/1627#discussion_r1223844855


##########
solr/prometheus-exporter/src/java/org/apache/solr/prometheus/collector/MetricSamples.java:
##########
@@ -71,4 +83,8 @@ public List<Collector.MetricFamilySamples> asList() {
         .filter(value -> !value.samples.isEmpty())
         .collect(Collectors.toList());
   }
+
+  private void addUnseenSamples(Collector.MetricFamilySamples metricFamilySamples) {
+    seenSamples.addAll(metricFamilySamples.samples);

Review Comment:
   This is confusing to me... a method named addUnseenSamples that calls seenSamples.addAll.  It seems inverted (and seemed as such at the call sites too; I wonder how does one add something you don't see?), yet I can also imagine how one might justify its name. Personally I think inlining this method would be better and would resolve it.  One less thing to name :-)  I know this is a philosophical bike shed for developers; some people love one-liner methods -- admittedly I'm not one of those people.
   
   If you love it as is, okay.  Maybe its just me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] mlbiscoc commented on a diff in pull request #1627: SOLR-16273: Prometheus Metric Exporter is very slow when collecting large amounts of sample data

Posted by "mlbiscoc (via GitHub)" <gi...@apache.org>.
mlbiscoc commented on code in PR #1627:
URL: https://github.com/apache/solr/pull/1627#discussion_r1192578929


##########
solr/prometheus-exporter/src/java/org/apache/solr/prometheus/collector/MetricSamples.java:
##########
@@ -47,7 +65,8 @@ public void addSampleIfMetricExists(
       return;
     }
 
-    if (!sampleFamily.samples.contains(sample)) {
+    if (!sampleMetricsCache.contains(sample.toString())) {
+      sampleMetricsCache.add(sample.toString());

Review Comment:
   No problem! I am also still learning this code base as well but please ask anything if it needs clarification.
   
   Exactly, this PR is to address the fact that we are searching in `O(n)` but we can change that to `O(1)`. Testing locally, I found significant performance increase.
   
   As for the `.toString` not including `exemplar`, I felt it was a non-issue because when we scrape the metrics from Solr, the prometheus exporter creates the sample object, but with `timestamp and exemplar = null`. So there will never be a string with a `timestamp` or `exemplar`. You can see here -> https://github.com/apache/solr/blob/7a31a1e23f6034c14639d7a208c5aaec64608cfd/solr/prometheus-exporter/src/java/org/apache/solr/prometheus/scraper/SolrScraper.java#L205



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley merged pull request #1627: SOLR-16273: Prometheus Metric Exporter is very slow when collecting large amounts of sample data

Posted by "dsmiley (via GitHub)" <gi...@apache.org>.
dsmiley merged PR #1627:
URL: https://github.com/apache/solr/pull/1627


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] mlbiscoc commented on a diff in pull request #1627: SOLR-16273: Prometheus Metric Exporter is very slow when collecting large amounts of sample data

Posted by "mlbiscoc (via GitHub)" <gi...@apache.org>.
mlbiscoc commented on code in PR #1627:
URL: https://github.com/apache/solr/pull/1627#discussion_r1224493642


##########
solr/prometheus-exporter/src/java/org/apache/solr/prometheus/collector/MetricSamples.java:
##########
@@ -71,4 +83,8 @@ public List<Collector.MetricFamilySamples> asList() {
         .filter(value -> !value.samples.isEmpty())
         .collect(Collectors.toList());
   }
+
+  private void addUnseenSamples(Collector.MetricFamilySamples metricFamilySamples) {
+    seenSamples.addAll(metricFamilySamples.samples);

Review Comment:
   Took another look at it and I agree, this is confusing. Just removed that function and added it inline where it was being called and updated CHANGES.txt
   
   As for the duplication explanation, I apologize I was wrong. It is not caused by collections, but cores. For the default configuration, `solr_ping` seems to be the biggest culprit because the labels are only `base_url` and `cluster_id`. The PingCollection scrapes the metrics at all cores, but `solr_ping` have the same labels and values across cores with the same host. This issue could possibly arise as well with custom configurations on metrics finding duplicates if the labels aren't unique enough. so this `seenSamples` should catch that and not output duplicates.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] mlbiscoc commented on a diff in pull request #1627: SOLR-16273: Prometheus Metric Exporter is very slow when collecting large amounts of sample data

Posted by "mlbiscoc (via GitHub)" <gi...@apache.org>.
mlbiscoc commented on code in PR #1627:
URL: https://github.com/apache/solr/pull/1627#discussion_r1224493642


##########
solr/prometheus-exporter/src/java/org/apache/solr/prometheus/collector/MetricSamples.java:
##########
@@ -71,4 +83,8 @@ public List<Collector.MetricFamilySamples> asList() {
         .filter(value -> !value.samples.isEmpty())
         .collect(Collectors.toList());
   }
+
+  private void addUnseenSamples(Collector.MetricFamilySamples metricFamilySamples) {
+    seenSamples.addAll(metricFamilySamples.samples);

Review Comment:
   Took another look at it and I agree, this is confusing. Just removed that function and added it inline where it was being called.
   
   As for the duplication explanation, I apologize I was wrong. It is not caused by collections, but cores. For the default configuration, `solr_ping` seems to be the biggest culprit because the labels are only `base_url` and `cluster_id`. The PingCollection scrapes the metrics at all cores, but `solr_ping` have the same labels and values across cores with the same host. This issue could possibly arise as well with custom configurations on metrics finding duplicates if the labels aren't unique enough. so this `seenSamples` should catch that and not output duplicates.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] cpoerschke commented on a diff in pull request #1627: SOLR-16273: Prometheus Metric Exporter is very slow when collecting large amounts of sample data

Posted by "cpoerschke (via GitHub)" <gi...@apache.org>.
cpoerschke commented on code in PR #1627:
URL: https://github.com/apache/solr/pull/1627#discussion_r1192503528


##########
solr/prometheus-exporter/src/java/org/apache/solr/prometheus/collector/MetricSamples.java:
##########
@@ -47,7 +65,8 @@ public void addSampleIfMetricExists(
       return;
     }
 
-    if (!sampleFamily.samples.contains(sample)) {
+    if (!sampleMetricsCache.contains(sample.toString())) {
+      sampleMetricsCache.add(sample.toString());

Review Comment:
   Thanks @mlbiscoc for explaining the issue in the PR description, I'm new to this part of the code base and the explanation really helped!
   
   So here the `sampleFamily.samples.contains` is inefficient for a large list and the idea is that a `HashSet.contains` would be more efficient.
   
   It seems that `sample.toString()` doesn't include the `exemplar` member -- https://github.com/prometheus/client_java/blob/parent-0.16.0/simpleclient/src/main/java/io/prometheus/client/Collector.java#L256-L292 -- i.e. comparison of `sample` vs. comparison via `sample.toString()` could theoretically differ? Hmm.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] cpoerschke commented on a diff in pull request #1627: SOLR-16273: Prometheus Metric Exporter is very slow when collecting large amounts of sample data

Posted by "cpoerschke (via GitHub)" <gi...@apache.org>.
cpoerschke commented on code in PR #1627:
URL: https://github.com/apache/solr/pull/1627#discussion_r1192075956


##########
solr/prometheus-exporter/src/java/org/apache/solr/prometheus/collector/MetricSamples.java:
##########
@@ -19,24 +19,42 @@
 
 import io.prometheus.client.Collector;
 import java.util.HashMap;
+import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
+import java.util.Set;
 import java.util.stream.Collectors;
 
 public class MetricSamples {
 
   private final Map<String, Collector.MetricFamilySamples> samplesByMetricName;
 
+  private Set<String> sampleMetricsCache;

Review Comment:
   minor: maybe could be final (from cursory code read only)
   ```suggestion
     private final Set<String> sampleMetricsCache;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org