You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@skywalking.apache.org by wu...@apache.org on 2021/06/30 02:18:11 UTC

[skywalking] branch master updated: Optimize IDs reading in the persistent worker. (#7193)

This is an automated email from the ASF dual-hosted git repository.

wusheng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/skywalking.git


The following commit(s) were added to refs/heads/master by this push:
     new 16ebaad  Optimize IDs reading in the persistent worker. (#7193)
16ebaad is described below

commit 16ebaade6a75043f2d6ab398def3281fd37af74e
Author: 吴晟 Wu Sheng <wu...@foxmail.com>
AuthorDate: Wed Jun 30 10:17:46 2021 +0800

    Optimize IDs reading in the persistent worker. (#7193)
    
    * Optimize IDs reading in the persistent worker.
---
 CHANGES.md                                         | 36 +++++++++++++---------
 .../analysis/worker/MetricsPersistentWorker.java   | 19 +++++++++---
 .../oap/server/core/storage/IMetricsDAO.java       |  4 +--
 3 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/CHANGES.md b/CHANGES.md
index 9d49bf9..8ae7936 100644
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -4,39 +4,45 @@ Release Notes.
 
 8.7.0
 ------------------
+
 #### Project
+
 * Extract dependency management to a bom.
 * Add JDK 16 to test matrix.
 
 #### Java Agent
+
 * Supports modifying span attributes in async mode.
 * Agent supports the collection of JVM arguments and jar dependency information.
-* [Temporary] Support authentication for log report channel. This feature and grpc channel is going to be removed after Satellite 0.2.0 release.
-* Remove deprecated gRPC method, `io.grpc.ManagedChannelBuilder#nameResolverFactory`. See [gRPC-java 7133](https://github.com/grpc/grpc-java/issues/7133) for more details.
+* [Temporary] Support authentication for log report channel. This feature and grpc channel is going to be removed after
+  Satellite 0.2.0 release.
+* Remove deprecated gRPC method, `io.grpc.ManagedChannelBuilder#nameResolverFactory`.
+  See [gRPC-java 7133](https://github.com/grpc/grpc-java/issues/7133) for more details.
 * Add `Neo4j-4.x` plugin.
 * Correct `profile.duration` to `profile.max_duration` in the default `agent.config` file.
-* Fix the reponse time of gRPC.
+* Fix the response time of gRPC.
 
 #### OAP-Backend
+
 * Disable Spring sleuth meter analyzer by default.
 * Only count 5xx as error in Envoy ALS receiver.
 * Upgrade apollo core caused by CVE-2020-15170.
 * Upgrade kubernetes client caused by CVE-2020-28052.
 * Upgrade Elasticsearch 7 client caused by CVE-2020-7014.
-* Upgrade jackson related libs caused by CVE-2018-11307, CVE-2018-14718 ~ CVE-2018-14721, CVE-2018-19360 ~ CVE-2018-19362,
-   CVE-2019-14379, CVE-2019-14540, CVE-2019-14892, CVE-2019-14893, CVE-2019-16335, CVE-2019-16942, CVE-2019-16943,
-   CVE-2019-17267, CVE-2019-17531, CVE-2019-20330, CVE-2020-8840, CVE-2020-9546, CVE-2020-9547, CVE-2020-9548,
-   CVE-2018-12022, CVE-2018-12023, CVE-2019-12086, CVE-2019-14439, CVE-2020-10672, CVE-2020-10673, CVE-2020-10968,
-   CVE-2020-10969, CVE-2020-11111, CVE-2020-11112, CVE-2020-11113, CVE-2020-11619, CVE-2020-11620, CVE-2020-14060,
-   CVE-2020-14061, CVE-2020-14062, CVE-2020-14195, CVE-2020-24616, CVE-2020-24750, CVE-2020-25649, CVE-2020-35490,
-   CVE-2020-35491, CVE-2020-35728 and CVE-2020-36179 ~ CVE-2020-36190.
+* Upgrade jackson related libs caused by CVE-2018-11307, CVE-2018-14718 ~ CVE-2018-14721, CVE-2018-19360 ~
+  CVE-2018-19362, CVE-2019-14379, CVE-2019-14540, CVE-2019-14892, CVE-2019-14893, CVE-2019-16335, CVE-2019-16942,
+  CVE-2019-16943, CVE-2019-17267, CVE-2019-17531, CVE-2019-20330, CVE-2020-8840, CVE-2020-9546, CVE-2020-9547,
+  CVE-2020-9548, CVE-2018-12022, CVE-2018-12023, CVE-2019-12086, CVE-2019-14439, CVE-2020-10672, CVE-2020-10673,
+  CVE-2020-10968, CVE-2020-10969, CVE-2020-11111, CVE-2020-11112, CVE-2020-11113, CVE-2020-11619, CVE-2020-11620,
+  CVE-2020-14060, CVE-2020-14061, CVE-2020-14062, CVE-2020-14195, CVE-2020-24616, CVE-2020-24750, CVE-2020-25649,
+  CVE-2020-35490, CVE-2020-35491, CVE-2020-35728 and CVE-2020-36179 ~ CVE-2020-36190.
 * Exclude log4j 1.x caused by CVE-2019-17571.
 * Upgrade log4j 2.x caused by CVE-2020-9488.
 * Upgrade nacos libs caused by CVE-2021-29441 and CVE-2021-29442.
-* Upgrade netty caused by CVE-2019-20444, CVE-2019-20445, CVE-2019-16869, CVE-2020-11612, CVE-2021-21290, CVE-2021-21295 
-   and CVE-2021-21409.
+* Upgrade netty caused by CVE-2019-20444, CVE-2019-20445, CVE-2019-16869, CVE-2020-11612, CVE-2021-21290, CVE-2021-21295
+  and CVE-2021-21409.
 * Upgrade consul client caused by CVE-2018-1000844, CVE-2018-1000850.
-* Upgrade zookeeper caused by CVE-2019-0201. 
+* Upgrade zookeeper caused by CVE-2019-0201.
 * Upgrade snake yaml caused by CVE-2017-18640.
 * Upgrade embed tomcat caused by CVE-2020-13935.
 * Upgrade commons-lang3 to avoid potential NPE in some JDK versions.
@@ -49,8 +55,11 @@ Release Notes.
 * Fix: slowDBAccessThreshold dynamic config error when not configured.
 * Performance: cache regex pattern and result, optimize string concatenation in Envy ALS analyzer.
 * Performance: cache metrics id and entity id in `Metrics` and `ISource`.
+* Performance: enhance persistent session mechanism, about differentiating cache timeout for different dimensionality
+  metrics. The timeout of the cache for minute and hour level metrics has been prolonged to ~5 min. 
 
 #### UI
+
 * Fix the date component for log conditions.
 * Fix selector keys for duplicate options.
 * Add Python celery plugin.
@@ -59,7 +68,6 @@ Release Notes.
 
 #### Documentation
 
-
 All issues and pull requests are [here](https://github.com/apache/skywalking/milestone/90?closed=1)
 
 ------------------
diff --git a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/worker/MetricsPersistentWorker.java b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/worker/MetricsPersistentWorker.java
index 5195ac6..b7c0735 100644
--- a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/worker/MetricsPersistentWorker.java
+++ b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/worker/MetricsPersistentWorker.java
@@ -51,6 +51,11 @@ import org.apache.skywalking.oap.server.telemetry.api.MetricsTag;
  */
 @Slf4j
 public class MetricsPersistentWorker extends PersistenceWorker<Metrics> {
+    /**
+     * The counter of MetricsPersistentWorker instance, to calculate session timeout offset.
+     */
+    private static long SESSION_TIMEOUT_OFFSITE_COUNTER = 0;
+
     private final Model model;
     private final Map<Metrics, Metrics> context;
     private final IMetricsDAO metricsDAO;
@@ -61,6 +66,7 @@ public class MetricsPersistentWorker extends PersistenceWorker<Metrics> {
     private final boolean enableDatabaseSession;
     private final boolean supportUpdate;
     private CounterMetrics aggregationCounter;
+    private long sessionTimeout = 70_000; // Unit, ms. 70,000ms means more than one minute.
 
     MetricsPersistentWorker(ModuleDefineHolder moduleDefineHolder, Model model, IMetricsDAO metricsDAO,
                             AbstractWorker<Metrics> nextAlarmWorker, AbstractWorker<ExportEvent> nextExportWorker,
@@ -98,10 +104,11 @@ public class MetricsPersistentWorker extends PersistenceWorker<Metrics> {
             new MetricsTag.Keys("metricName", "level", "dimensionality"),
             new MetricsTag.Values(model.getName(), "2", model.getDownsampling().getName())
         );
+        SESSION_TIMEOUT_OFFSITE_COUNTER++;
     }
 
     /**
-     * Create the leaf MetricsPersistentWorker, no next step.
+     * Create the leaf and down-sampling MetricsPersistentWorker, no next step.
      */
     MetricsPersistentWorker(ModuleDefineHolder moduleDefineHolder, Model model, IMetricsDAO metricsDAO,
                             boolean enableDatabaseSession, boolean supportUpdate) {
@@ -109,6 +116,10 @@ public class MetricsPersistentWorker extends PersistenceWorker<Metrics> {
              null, null, null,
              enableDatabaseSession, supportUpdate
         );
+        // For a down-sampling metrics, we prolong the session timeout for 4 times, nearly 5 minutes.
+        // And add offset according to worker creation sequence, to avoid context clear overlap,
+        // eventually optimize load of IDs reading.
+        this.sessionTimeout = sessionTimeout * 4 + SESSION_TIMEOUT_OFFSITE_COUNTER * 200;
     }
 
     /**
@@ -216,7 +227,7 @@ public class MetricsPersistentWorker extends PersistenceWorker<Metrics> {
             if (noInCacheMetrics.isEmpty()) {
                 return;
             }
-
+            
             final List<Metrics> dbMetrics = metricsDAO.multiGet(model, noInCacheMetrics);
             if (!enableDatabaseSession) {
                 // Clear the cache only after results from DB are returned successfully.
@@ -235,8 +246,8 @@ public class MetricsPersistentWorker extends PersistenceWorker<Metrics> {
             while (iterator.hasNext()) {
                 Metrics metrics = iterator.next();
                 metrics.extendSurvivalTime(tookTime);
-                // 70,000ms means more than one minute.
-                if (metrics.getSurvivalTime() > 70000) {
+
+                if (metrics.getSurvivalTime() > sessionTimeout) {
                     iterator.remove();
                 }
             }
diff --git a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/storage/IMetricsDAO.java b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/storage/IMetricsDAO.java
index c1ef378..2b4c05a 100644
--- a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/storage/IMetricsDAO.java
+++ b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/storage/IMetricsDAO.java
@@ -32,8 +32,8 @@ public interface IMetricsDAO extends DAO {
     /**
      * Read data from the storage by given IDs.
      *
-     * @param model     target entity of this query.
-     * @param metrics   metrics list.
+     * @param model   target entity of this query.
+     * @param metrics metrics list.
      * @return the data of all given IDs. Only include existing data. Don't require to keep the same order of ids list.
      * @throws IOException when error occurs in data query.
      */