You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/09/29 11:00:45 UTC

[GitHub] [druid] ayushkul2910 opened a new pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

ayushkul2910 opened a new pull request #10448:
URL: https://github.com/apache/druid/pull/10448


   <!-- Thanks for trying to help us make Apache Druid be the best it can be! Please fill out as much of the following information as is possible (where relevant, and remove it when irrelevant) to help make the intention and scope of this PR clear in order to ease review. -->
   
   Fixes #9283.
   
   <!-- Replace XXXX with the id of the issue fixed in this PR. Remove this section if there is no corresponding issue. Don't reference the issue in the title of this pull-request. -->
   
   <!-- If you are a committer, follow the PR action item checklist for committers:
   https://github.com/apache/druid/blob/master/dev/committer-instructions.md#pr-and-issue-action-item-checklist-for-committers. -->
   
   ### Description
   
   <!-- Describe the goal of this PR, what problem are you fixing. If there is a corresponding issue (referenced above), it's not necessary to repeat the description here, however, you may choose to keep one summary sentence. -->
   
   This PR fixes the clock drift issue while emitting metrics. ScheduledExecutorService is very much prone to clock drift especially in certain windows platforms (see this: https://stackoverflow.com/questions/56571647/why-does-the-java-scheduler-exhibit-significant-time-drift-on-windows). I have used [CronScheduler](https://github.com/TimeAndSpaceIO/CronScheduler) instead of ScheduledExecutorService, since CronScheduler provides proof against the clock drift issue.
   
   <!-- Describe your patch: what did you change in code? How did you fix the problem? -->
   I have created a new method called scheduleAtFixedRate in ScheduledExecutors.java which takes CronScheduler as a param. This method is responsible for scheduling the metrics at a particular period after a certain initial delay. In current implementation the method uses ScheduledExecutorService which is prone to clock drift.
   
   <!-- If there are several relatively logically separate changes in this PR, create a mini-section for each of them. For example: -->
   
   <!--
   In each section, please describe design decisions made, including:
    - Choice of algorithms
    - Behavioral aspects. What configuration values are acceptable? How are corner cases and error conditions handled, such as when there are insufficient resources?
    - Class organization and design (how the logic is split between classes, inheritance, composition, design patterns)
    - Method organization and design (how the logic is split between methods, parameters and return types)
    - Naming (class, method, API, configuration, HTTP endpoint, names of emitted metrics)
   -->
   
   
   <!-- It's good to describe an alternative design (or mention an alternative name) for every design (or naming) decision point and compare the alternatives with the designs that you've implemented (or the names you've chosen) to highlight the advantages of the chosen designs and names. -->
   
   <!-- If there was a discussion of the design of the feature implemented in this PR elsewhere (e. g. a "Proposal" issue, any other issue, or a thread in the development mailing list), link to that discussion from this PR description and explain what have changed in your final design compared to your original proposal or the consensus version in the end of the discussion. If something hasn't changed since the original discussion, you can omit a detailed discussion of those aspects of the design here, perhaps apart from brief mentioning for the sake of readability of this PR description. -->
   
   <!-- Some of the aspects mentioned above may be omitted for simple and small changes. -->
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [x] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist above are strictly necessary, but it would be very helpful if you at least self-review the PR. -->
   
   <hr>
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] leventov merged pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
leventov merged pull request #10448:
URL: https://github.com/apache/druid/pull/10448


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson edited a comment on pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
jihoonson edited a comment on pull request #10448:
URL: https://github.com/apache/druid/pull/10448#issuecomment-755111625


   Hi @leventov, as test coverage is not very great for Druid metrics system, I'm wondering how much the CronScheduler is well-tested in production environment, especially when it runs for a long time period. Could you give me some idea?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ayushkul2910 commented on a change in pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
ayushkul2910 commented on a change in pull request #10448:
URL: https://github.com/apache/druid/pull/10448#discussion_r512608527



##########
File path: core/src/main/java/org/apache/druid/java/util/common/concurrent/ScheduledExecutors.java
##########
@@ -167,6 +169,50 @@ public void run()
     );
   }
 
+  public static void scheduleAtFixedRate(CronScheduler exec, Duration rate, Callable<Signal> callable)
+  {
+    scheduleAtFixedRate(exec, rate, rate, callable);
+  }
+
+  /**
+   * Run callable once every period, after the given initial delay. Uses
+   * {@link CronScheduler} for task scheduling. Exceptions are caught and logged
+   * as errors.
+   */
+  public static void scheduleAtFixedRate(
+      final CronScheduler exec,
+      final Duration initialDelay,
+      final Duration rate,
+      final Callable<Signal> callable
+  )
+  {
+    log.debug("Scheduling periodically: %s with period %s", callable, rate);
+    Instant delayInstance = Instant.now().plusMillis(initialDelay.getMillis());
+    exec.scheduleAt(delayInstance,

Review comment:
       Keeping the above points in mind, I think this implementation will do.
   
       synchronized (lock) {
         monitor.start();
         Long rate = config.getEmitterPeriod().getMillis();
         Future<?> scheduledFuture = scheduler.scheduleAtFixedRate(
             rate,
             rate,
             TimeUnit.MILLISECONDS,
             new CronTask()
             {
               private volatile Future<Boolean> monitorFuture = null;
               @Override
               public void run(long scheduledRunTimeMillis)
               {
                 try {
                   if (monitorFuture != null && monitorFuture.isDone()
                       && !(monitorFuture.get() && hasMonitor(monitor))) {
                     removeMonitor(monitor);
                     monitor.getScheduledFuture().cancel(false);
                     log.debug("Stopped rescheduling %s (delay %s)", this, rate);
                     return;
                   }
                   
                   log.trace("Running %s (period %s)", this, rate);
                   monitorFuture = executor.submit(new Callable<Boolean>()
                   {
                     public Boolean call()
                     {
                       try {
                         return monitor.monitor(emitter);
                       } catch (Throwable e) {
                         log.error(e, "Uncaught exception.");
                         return false;
                       }
                     }
                   });
                 } 
                 catch (Throwable e) {
                   log.error(e, "Uncaught exception.");
                 }
               }
             });
         monitor.setScheduledFuture(scheduledFuture);
       }
   
   
   In this:
   1.  Each monitor has a separate future.
   2.  Cron task is cheap, it checks a boolean condition. If condition is true it cancels the scheduling process for the 
        particular monitor, else submits a callable for monitoring to executor service 
   3.  No race condition for cancelling scheduledFuture on first iteration.
   
   Please let me know your thoughts on this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ayushkul2910 commented on a change in pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
ayushkul2910 commented on a change in pull request #10448:
URL: https://github.com/apache/druid/pull/10448#discussion_r512608527



##########
File path: core/src/main/java/org/apache/druid/java/util/common/concurrent/ScheduledExecutors.java
##########
@@ -167,6 +169,50 @@ public void run()
     );
   }
 
+  public static void scheduleAtFixedRate(CronScheduler exec, Duration rate, Callable<Signal> callable)
+  {
+    scheduleAtFixedRate(exec, rate, rate, callable);
+  }
+
+  /**
+   * Run callable once every period, after the given initial delay. Uses
+   * {@link CronScheduler} for task scheduling. Exceptions are caught and logged
+   * as errors.
+   */
+  public static void scheduleAtFixedRate(
+      final CronScheduler exec,
+      final Duration initialDelay,
+      final Duration rate,
+      final Callable<Signal> callable
+  )
+  {
+    log.debug("Scheduling periodically: %s with period %s", callable, rate);
+    Instant delayInstance = Instant.now().plusMillis(initialDelay.getMillis());
+    exec.scheduleAt(delayInstance,

Review comment:
       Keeping your above points in mind, I think this implementation will do.
   
       synchronized (lock) {
         monitor.start();
         Long rate = config.getEmitterPeriod().getMillis();
         Future<?> scheduledFuture = scheduler.scheduleAtFixedRate(
             rate,
             rate,
             TimeUnit.MILLISECONDS,
             new CronTask()
             {
               private volatile Future<Boolean> monitorFuture = null;
               @Override
               public void run(long scheduledRunTimeMillis)
               {
                 try {
                   if (monitorFuture != null && monitorFuture.isDone()
                       && !(monitorFuture.get() && hasMonitor(monitor))) {
                     removeMonitor(monitor);
                     monitor.getScheduledFuture().cancel(false);
                     log.debug("Stopped rescheduling %s (delay %s)", this, rate);
                     return;
                   }
                   
                   log.trace("Running %s (period %s)", this, rate);
                   monitorFuture = executor.submit(new Callable<Boolean>()
                   {
                     public Boolean call()
                     {
                       try {
                         return monitor.monitor(emitter);
                       } catch (Throwable e) {
                         log.error(e, "Uncaught exception.");
                         return false;
                       }
                     }
                   });
                 } 
                 catch (Throwable e) {
                   log.error(e, "Uncaught exception.");
                 }
               }
             });
         monitor.setScheduledFuture(scheduledFuture);
       }
   
   
   In this:
   1.  Each monitor has a separate future.
   2.  Cron task is cheap, it checks a boolean condition. If condition is true it cancels the scheduling process for the 
        particular monitor, else submits a callable for monitoring to executor service 
   3.  No race condition for cancelling scheduledFuture on first iteration.
   
   Please let me know your thoughts on this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] leventov commented on pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
leventov commented on pull request #10448:
URL: https://github.com/apache/druid/pull/10448#issuecomment-755968409


   > Hi @leventov, as test coverage is not very great for Druid metrics system, I'm wondering how much the CronScheduler is well-tested in production environment, especially when it runs for a long time period. Could you give me some idea?
   
   I don't know of any production deployment of CronScheduler. However, I want to note that the fact that metric/monitoring systems are not well-tested is precisely _the_ reason to introduce CronScheduler, which is engineered for the sole purpose to be more stable than standard library executors.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] leventov commented on pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
leventov commented on pull request #10448:
URL: https://github.com/apache/druid/pull/10448#issuecomment-729720774


   dev@druid.apache.org mailing list


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ayushkul2910 commented on a change in pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
ayushkul2910 commented on a change in pull request #10448:
URL: https://github.com/apache/druid/pull/10448#discussion_r553893369



##########
File path: server/src/main/java/org/apache/druid/server/metrics/MetricsModule.java
##########
@@ -108,10 +108,10 @@ public MonitorScheduler getMonitorScheduler(
 
     return new MonitorScheduler(
         config.get(),
-        CronScheduler.newBuilder(Duration.ofSeconds(1L)).setThreadName("MonitorScheduler-%s").build(),
+        CronScheduler.newBuilder(Duration.ofSeconds(1L)).setThreadName("MonitorSchedulerThread").build(),
         emitter,
         monitors,
-        Executors.newCachedThreadPool()
+        Execs.multiThreaded(64, "MonitorThread-%d")

Review comment:
       This makes sense, we should reduce the number of monitor threads.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ayushkul2910 commented on pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
ayushkul2910 commented on pull request #10448:
URL: https://github.com/apache/druid/pull/10448#issuecomment-729618624


   Thanks @leventov for reviewing the PR. Can you please let me know, how can I reach out to other active maintainers?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ayushkul2910 commented on a change in pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
ayushkul2910 commented on a change in pull request #10448:
URL: https://github.com/apache/druid/pull/10448#discussion_r553893369



##########
File path: server/src/main/java/org/apache/druid/server/metrics/MetricsModule.java
##########
@@ -108,10 +108,10 @@ public MonitorScheduler getMonitorScheduler(
 
     return new MonitorScheduler(
         config.get(),
-        CronScheduler.newBuilder(Duration.ofSeconds(1L)).setThreadName("MonitorScheduler-%s").build(),
+        CronScheduler.newBuilder(Duration.ofSeconds(1L)).setThreadName("MonitorSchedulerThread").build(),
         emitter,
         monitors,
-        Executors.newCachedThreadPool()
+        Execs.multiThreaded(64, "MonitorThread-%d")

Review comment:
       This makes sense, we should reduce the number of monitor threads.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
jihoonson commented on pull request #10448:
URL: https://github.com/apache/druid/pull/10448#issuecomment-756367474


   @leventov yeah, I understand the motivation behind making the CronScheduler. I'm just worried about its maturity because every cluster will be impacted after upgrade if there is any bug in there. I looked at the test coverage of CronScheduler which seems not bad (about 60-65%). I also tested it manually for about an hour, it seems working well. However, I can't be 100% sure there will be no unexpected bugs since I'm not familiar enough with its code. So, I think it would be better to add a feature flag for this just in case (https://github.com/apache/druid/pull/10732). In my PR, the default still uses CronScheduler, but users can choose the legacy monitor scheduler if they want.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ayushkul2910 commented on a change in pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
ayushkul2910 commented on a change in pull request #10448:
URL: https://github.com/apache/druid/pull/10448#discussion_r507186926



##########
File path: core/src/main/java/org/apache/druid/java/util/common/concurrent/ScheduledExecutors.java
##########
@@ -167,6 +169,50 @@ public void run()
     );
   }
 
+  public static void scheduleAtFixedRate(CronScheduler exec, Duration rate, Callable<Signal> callable)
+  {
+    scheduleAtFixedRate(exec, rate, rate, callable);
+  }
+
+  /**
+   * Run callable once every period, after the given initial delay. Uses
+   * {@link CronScheduler} for task scheduling. Exceptions are caught and logged
+   * as errors.
+   */
+  public static void scheduleAtFixedRate(
+      final CronScheduler exec,
+      final Duration initialDelay,
+      final Duration rate,
+      final Callable<Signal> callable
+  )
+  {
+    log.debug("Scheduling periodically: %s with period %s", callable, rate);
+    Instant delayInstance = Instant.now().plusMillis(initialDelay.getMillis());
+    exec.scheduleAt(delayInstance,

Review comment:
       That makes sense. I'll change this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ayushkul2910 commented on a change in pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
ayushkul2910 commented on a change in pull request #10448:
URL: https://github.com/apache/druid/pull/10448#discussion_r511941702



##########
File path: core/src/main/java/org/apache/druid/java/util/common/concurrent/ScheduledExecutors.java
##########
@@ -167,6 +169,50 @@ public void run()
     );
   }
 
+  public static void scheduleAtFixedRate(CronScheduler exec, Duration rate, Callable<Signal> callable)
+  {
+    scheduleAtFixedRate(exec, rate, rate, callable);
+  }
+
+  /**
+   * Run callable once every period, after the given initial delay. Uses
+   * {@link CronScheduler} for task scheduling. Exceptions are caught and logged
+   * as errors.
+   */
+  public static void scheduleAtFixedRate(
+      final CronScheduler exec,
+      final Duration initialDelay,
+      final Duration rate,
+      final Callable<Signal> callable
+  )
+  {
+    log.debug("Scheduling periodically: %s with period %s", callable, rate);
+    Instant delayInstance = Instant.now().plusMillis(initialDelay.getMillis());
+    exec.scheduleAt(delayInstance,

Review comment:
       Hey @leventov, do you think the below implementation for `startMonitor(final Monitor monitor)` method in `MonitorScheduler` class will suffice? Also, can this cause any inconsistency  since `scheduledFuture` is volatile and is shared amongst all the monitors?
   
       synchronized (lock) {
         monitor.start();
         Long rate = config.getEmitterPeriod().getMillis();
         scheduledFuture = scheduler.scheduleAtFixedRate(
             rate,
             rate,
             TimeUnit.MILLISECONDS,
             new CronTask()
             {
               @Override
               public void run(long scheduledRunTimeMillis)
               {
                 try {
                   if (monitor.monitor(emitter) && hasMonitor(monitor)) {
                     log.trace("Running %s (period %s)", this, rate);
                   } else {
                     log.debug("Stopping rescheduling %s (delay %s)", this, rate);
                     removeMonitor(monitor);
                     while (scheduledFuture == null) {
                       Thread.sleep(1);
                     }
                     scheduledFuture.cancel(false);
                   }
                 } catch (Throwable e) {
                   log.error(e, "Uncaught exception.");
                 }
               }
             });
       }




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ayushkul2910 commented on pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
ayushkul2910 commented on pull request #10448:
URL: https://github.com/apache/druid/pull/10448#issuecomment-723453308


   Hi @leventov, I have updated the PR according to your suggestions. Can you please review once again?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] leventov commented on a change in pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
leventov commented on a change in pull request #10448:
URL: https://github.com/apache/druid/pull/10448#discussion_r507181662



##########
File path: core/src/main/java/org/apache/druid/java/util/common/concurrent/ScheduledExecutors.java
##########
@@ -167,6 +169,50 @@ public void run()
     );
   }
 
+  public static void scheduleAtFixedRate(CronScheduler exec, Duration rate, Callable<Signal> callable)
+  {
+    scheduleAtFixedRate(exec, rate, rate, callable);
+  }
+
+  /**
+   * Run callable once every period, after the given initial delay. Uses
+   * {@link CronScheduler} for task scheduling. Exceptions are caught and logged
+   * as errors.
+   */
+  public static void scheduleAtFixedRate(
+      final CronScheduler exec,
+      final Duration initialDelay,
+      final Duration rate,
+      final Callable<Signal> callable
+  )
+  {
+    log.debug("Scheduling periodically: %s with period %s", callable, rate);
+    Instant delayInstance = Instant.now().plusMillis(initialDelay.getMillis());
+    exec.scheduleAt(delayInstance,

Review comment:
       I think this logic: scheduling one-shot task which reschedules itself, is problematic. I think we should remove these methods (together with the `Signal` enum) because they have negative utility.
   
   In particular, for `CronScheduler`, constant rescheduling is probably prone to some drift. Instead, `CronScheduler`'s methods like `scheduleAtFixedRate()` should be used directly from `MonitorScheduler`. The periodic task can be cancelled using the returned `Future`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ayushkul2910 commented on a change in pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
ayushkul2910 commented on a change in pull request #10448:
URL: https://github.com/apache/druid/pull/10448#discussion_r553162349



##########
File path: server/src/main/java/org/apache/druid/server/metrics/MetricsModule.java
##########
@@ -108,10 +108,10 @@ public MonitorScheduler getMonitorScheduler(
 
     return new MonitorScheduler(
         config.get(),
-        CronScheduler.newBuilder(Duration.ofSeconds(1L)).setThreadName("MonitorScheduler-%s").build(),
+        CronScheduler.newBuilder(Duration.ofSeconds(1L)).setThreadName("MonitorSchedulerThread").build(),
         emitter,
         monitors,
-        Executors.newCachedThreadPool()
+        Execs.multiThreaded(64, "MonitorThread-%d")

Review comment:
       Hi @jihoonson , sorry for delay in response. 
   I think currently there are ~20 monitors, which can run concurrently with the MonitorScheduler class. Suppose a case in which frequency of scheduling < time taken by the executor thread to do `monitor.monitor(...)`(Although I am not sure if this case is possible in practical, kind of edge case). This can result in queuing of the tasks if threads are very less. I think we should atleast have no. of threads equal to max number of monitors supported. I may be missing something here. What do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #10448:
URL: https://github.com/apache/druid/pull/10448#discussion_r553607855



##########
File path: server/src/main/java/org/apache/druid/server/metrics/MetricsModule.java
##########
@@ -108,10 +108,10 @@ public MonitorScheduler getMonitorScheduler(
 
     return new MonitorScheduler(
         config.get(),
-        CronScheduler.newBuilder(Duration.ofSeconds(1L)).setThreadName("MonitorScheduler-%s").build(),
+        CronScheduler.newBuilder(Duration.ofSeconds(1L)).setThreadName("MonitorSchedulerThread").build(),
         emitter,
         monitors,
-        Executors.newCachedThreadPool()
+        Execs.multiThreaded(64, "MonitorThread-%d")

Review comment:
       The monitor usually takes less than 1 sec (probably less than 100 ms) while the emission period is large enough to run all monitors (1 min by default). So, I think the scenario you described can happen when there are some failures such as retrying metrics emission due to some network issue. However, I don't think we should handle these failures by employing multiple threads because there is nothing we can do better with more threads. I would rather not schedule a new monitor task if the previous one is still running. I implemented this in #10732.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] leventov commented on a change in pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
leventov commented on a change in pull request #10448:
URL: https://github.com/apache/druid/pull/10448#discussion_r512203193



##########
File path: core/src/main/java/org/apache/druid/java/util/common/concurrent/ScheduledExecutors.java
##########
@@ -167,6 +169,50 @@ public void run()
     );
   }
 
+  public static void scheduleAtFixedRate(CronScheduler exec, Duration rate, Callable<Signal> callable)
+  {
+    scheduleAtFixedRate(exec, rate, rate, callable);
+  }
+
+  /**
+   * Run callable once every period, after the given initial delay. Uses
+   * {@link CronScheduler} for task scheduling. Exceptions are caught and logged
+   * as errors.
+   */
+  public static void scheduleAtFixedRate(
+      final CronScheduler exec,
+      final Duration initialDelay,
+      final Duration rate,
+      final Callable<Signal> callable
+  )
+  {
+    log.debug("Scheduling periodically: %s with period %s", callable, rate);
+    Instant delayInstance = Instant.now().plusMillis(initialDelay.getMillis());
+    exec.scheduleAt(delayInstance,

Review comment:
       There should be a separate future for every monitor. I also think there should be a separate executorService for running `monitor.monitor(emitter)`, and cancelling the future from the first monitor, for two reasons:
    1. Monitor code is not guaranteed to be non-blocking and "cheap", which is the requirement of CronScheduler;
    2. You can avoid having a race condition of cancelling a future on the first iteration when it's not yet created.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ayushkul2910 commented on a change in pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
ayushkul2910 commented on a change in pull request #10448:
URL: https://github.com/apache/druid/pull/10448#discussion_r512608527



##########
File path: core/src/main/java/org/apache/druid/java/util/common/concurrent/ScheduledExecutors.java
##########
@@ -167,6 +169,50 @@ public void run()
     );
   }
 
+  public static void scheduleAtFixedRate(CronScheduler exec, Duration rate, Callable<Signal> callable)
+  {
+    scheduleAtFixedRate(exec, rate, rate, callable);
+  }
+
+  /**
+   * Run callable once every period, after the given initial delay. Uses
+   * {@link CronScheduler} for task scheduling. Exceptions are caught and logged
+   * as errors.
+   */
+  public static void scheduleAtFixedRate(
+      final CronScheduler exec,
+      final Duration initialDelay,
+      final Duration rate,
+      final Callable<Signal> callable
+  )
+  {
+    log.debug("Scheduling periodically: %s with period %s", callable, rate);
+    Instant delayInstance = Instant.now().plusMillis(initialDelay.getMillis());
+    exec.scheduleAt(delayInstance,

Review comment:
       Keeping the above points in mind, I think this implementation will do.
   
       synchronized (lock) {
         monitor.start();
         Long rate = config.getEmitterPeriod().getMillis();
         Future<?> scheduledFuture = scheduler.scheduleAtFixedRate(
             rate,
             rate,
             TimeUnit.MILLISECONDS,
             new CronTask()
             {
               private volatile Future<Boolean> monitorFuture = null;
               @Override
               public void run(long scheduledRunTimeMillis)
               {
                 try {
                   if (monitorFuture != null && monitorFuture.isDone()
                       && !(monitorFuture.get() && hasMonitor(monitor))) {
                     removeMonitor(monitor);
                     monitor.getScheduledFuture().cancel(false);
                     log.debug("Stopped rescheduling %s (delay %s)", this, rate);
                     return;
                   }
                   
                   log.trace("Running %s (period %s)", this, rate);
                   monitorFuture = executor.submit(new Callable<Boolean>()
                   {
                     public Boolean call()
                     {
                       try {
                         return monitor.monitor(emitter);
                       } catch (Throwable e) {
                         log.error(e, "Uncaught exception.");
                         return false;
                       }
                     }
                   });
                 } 
                 catch (Throwable e) {
                   log.error(e, "Uncaught exception.");
                 }
               }
             });
         monitor.setScheduledFuture(scheduledFuture);
       }
   
   
   In this:
   1.  Each monitor has a separate future.
   2.  Cron task is cheap, it checks a boolean condition. If condition is true it cancels the scheduling process for the particular monitor, else submits a callable for monitoring to executor service 
   3.  No race condition for cancelling scheduledFuture on first iteration.
   
   Please let me know your thoughts on this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ayushkul2910 commented on pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
ayushkul2910 commented on pull request #10448:
URL: https://github.com/apache/druid/pull/10448#issuecomment-733187721


   Thanks for the review. Can we now merge this PR @QiuMM @leventov?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
jihoonson commented on pull request #10448:
URL: https://github.com/apache/druid/pull/10448#issuecomment-755111625


   Hi @leventov, as test coverage is not very great for Druid metrics system, I'm wondering how much the CronScheduler is well-tested in production environment. I think the CronScheduler is proved working well for short period of time since all Druid integration tests passed which include those tests running for a couple of coordinator runs to run compaction and query compacted segments. But, I'm more worried about how stable it is when it runs for a long time period. Could you give me some idea?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #10448:
URL: https://github.com/apache/druid/pull/10448#discussion_r552386109



##########
File path: server/src/main/java/org/apache/druid/server/metrics/MetricsModule.java
##########
@@ -108,10 +108,10 @@ public MonitorScheduler getMonitorScheduler(
 
     return new MonitorScheduler(
         config.get(),
-        CronScheduler.newBuilder(Duration.ofSeconds(1L)).setThreadName("MonitorScheduler-%s").build(),
+        CronScheduler.newBuilder(Duration.ofSeconds(1L)).setThreadName("MonitorSchedulerThread").build(),
         emitter,
         monitors,
-        Executors.newCachedThreadPool()
+        Execs.multiThreaded(64, "MonitorThread-%d")

Review comment:
       This looks like an overkill. What was the rationale for using this many threads?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org