You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/12/01 15:37:50 UTC

[GitHub] [druid] danprince1 opened a new pull request #12013: Create a standalone primary replicant loader in the Coordinator

danprince1 opened a new pull request #12013:
URL: https://github.com/apache/druid/pull/12013


   Fixes #10606.
   (This is a rework of previous PR #10699, which was closed without merging)
   
   ### Description
   
   Please reference the proposal for detailed information. I will provide a summary here:
   
   I am proposing an optional new DutiesRunnable in the DruidCoordinator. Operators can choose whether or not to break primary replicant loading out into its own DutiesRunnable. If they choose not to enable the dedicated primary replicant loading, their coordinator will function just as it always has. If they choose to enable the dedicated primary replicant loading, their coordinator will add a scheduled DutiesRunnable dedicated to executing matching LoadRule for segments and only doing the primary replicant load for that LoadRule when run. The HistoricalManagement DutiesRunnable will continue all other HistoricalManagement duties including performing non-primary replicant loading and replicant dropping while executing a matched LoadRule for a segment.
   
   My implementation for the proposal exposes two new Coordinator runtime configurations for operators: `druid.coordinator.loadPrimaryReplicantSeparately` and `druid.coordinator.period.primaryReplicantLoaderPeriod`. If they choose to enable the first, then a scheduled executor with a configurable backoff period is configured for loading primary replicants.
   
   Importantly, this PR also **introduces concurrency to the Coordinator**, via the configuration `druid.coordinator.dutiesRunnableExecutor.threadPoolSize`.  This thread poll was previously hard-coded to contain only 1 thread.
   
   I am marking this with Design Review label because the proposal never got the traction I wanted to confirm the design. Thus, this will require 2 +1's from committers before merge.
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `DruidCoordinator`
    * `LoadRule`
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
      - [x] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [x] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [x] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] capistrant commented on a change in pull request #12013: Create a standalone primary replicant loader in the Coordinator

Posted by GitBox <gi...@apache.org>.
capistrant commented on a change in pull request #12013:
URL: https://github.com/apache/druid/pull/12013#discussion_r762099478



##########
File path: docs/configuration/index.md
##########
@@ -781,7 +781,10 @@ These Coordinator static configurations can be defined in the `coordinator/runti
 
 |Property|Description|Default|
 |--------|-----------|-------|
+|`druid.coordinator.dutiesRunnableExecutor.threadPoolSize`|The number of threads used by the `ScheduledExecutorService` that executes `DutiesRunnable` objects|1|
 |`druid.coordinator.period`|The run period for the Coordinator. The Coordinator operates by maintaining the current state of the world in memory and periodically looking at the set of "used" segments and segments being served to make decisions about whether any changes need to be made to the data topology. This property sets the delay between each of these runs.|PT60S|
+|`druid.coordinator.loadPrimaryReplicantSeparately`|Flag that indicates if the Coordinator should put primary replicant loading for used segments on its own event loop. This will require more memory and CPU to be used by the coordinator, but will ensure that the loading of primary replicas for used segments is not blocked by other coordinator jobs. The default of false is likely fine for most clusters. Enabling it could make sense if you regularly have to wait longer than you'd like for coordination cycles to complete before primary replicants are loaded for used segments that are not being served by the cluster.|false|

Review comment:
       let's include a note that if you set this to true, you will also want to bump the threadPoolSize to make it worthwhile. A more involved question would be, do we want to add a precondition to coordinator startup that if this is true, threadPoolSize must be `> 1`? That would prevent operators from mistakenly misconfiguring their cluster

##########
File path: server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java
##########
@@ -1017,9 +1137,11 @@ void stopPeonsForDisappearedServers(List<ImmutableDruidServer> servers)
         disappeared.remove(server.getName());
       }
       for (String name : disappeared) {
-        log.debug("Removing listener for server[%s] which is no longer there.", name);
         LoadQueuePeon peon = loadManagementPeons.remove(name);
-        peon.stop();
+        if (null != peon) {

Review comment:
       nit: comment about the race condition we found might be worthwhile here to fill future readers in on why this null check was found

##########
File path: server/src/main/java/org/apache/druid/server/coordinator/SegmentReplicantLookup.java
##########
@@ -130,6 +130,12 @@ private int getLoadingReplicants(SegmentId segmentId)
     return retVal;
   }
 
+  public Map<String, Integer> getLoadingTiers(SegmentId segmentId)

Review comment:
       javadoc could be helpful

##########
File path: server/src/test/java/org/apache/druid/server/http/CoordinatorDynamicConfigTest.java
##########
@@ -272,7 +273,7 @@ public void testDecommissioningParametersBackwardCompatibility() throws Exceptio
         0,
         false,
         false,
-        Integer.MAX_VALUE
+        0

Review comment:
       any specific reasoning behind flipping this value around in some of the tests?

##########
File path: server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java
##########
@@ -900,6 +1012,7 @@ public void run()
           );
         }
 
+        log.info("Duties group %s starting", dutiesRunnableAlias);

Review comment:
       This could be candidate for debug level

##########
File path: server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java
##########
@@ -665,68 +681,91 @@ private void becomeLeader()
       final int startingLeaderCounter = coordLeaderSelector.localTerm();
 
       final List<Pair<? extends DutiesRunnable, Duration>> dutiesRunnables = new ArrayList<>();
+      LoadRule.LoadRuleMode historicalManagementDutiesLoadRuleMode = LoadRule.LoadRuleMode.ALL;
+      if (config.isLoadPrimaryReplicantSeparately()) {
+        dutiesRunnables.add(
+            Pair.of(
+                new DutiesRunnable(
+                    makePrimaryReplicantManagementDuties(),
+                    startingLeaderCounter,
+                    PRIMARY_REPLICANT_LOADING_DUTIES_GROUP,
+                    RunRules.RunRulesMode.LOAD_RULE_ONLY,
+                    LoadRule.LoadRuleMode.PRIMARY_ONLY
+                ),
+                config.getCoordinatorPrimaryReplicantLoaderPeriod()
+            )
+        );
+        historicalManagementDutiesLoadRuleMode = LoadRule.LoadRuleMode.REPLICA_ONLY;
+      }
       dutiesRunnables.add(
           Pair.of(
-              new DutiesRunnable(makeHistoricalManagementDuties(), startingLeaderCounter, HISTORICAL_MANAGEMENT_DUTIES_DUTY_GROUP),
+              new DutiesRunnable(
+                  makeHistoricalManagementDuties(),
+                  startingLeaderCounter,
+                  HISTORICAL_MANAGEMENT_DUTIES_DUTY_GROUP,
+                  historicalManagementDutiesLoadRuleMode
+              ),
               config.getCoordinatorPeriod()
           )
       );
+      //noinspection VariableNotUsedInsideIf
       if (indexingServiceClient != null) {
         dutiesRunnables.add(
             Pair.of(
-                new DutiesRunnable(makeIndexingServiceDuties(), startingLeaderCounter, INDEXING_SERVICE_DUTIES_DUTY_GROUP),
+                new DutiesRunnable(
+                    makeIndexingServiceDuties(),
+                    startingLeaderCounter,
+                    INDEXING_SERVICE_DUTIES_DUTY_GROUP
+                ),
                 config.getCoordinatorIndexingPeriod()
             )
         );
       }
       dutiesRunnables.add(
           Pair.of(
-              new DutiesRunnable(makeMetadataStoreManagementDuties(), startingLeaderCounter, METADATA_STORE_MANAGEMENT_DUTIES_DUTY_GROUP),
+              new DutiesRunnable(
+                  makeMetadataStoreManagementDuties(),
+                  startingLeaderCounter,
+                  METADATA_STORE_MANAGEMENT_DUTIES_DUTY_GROUP
+              ),
               config.getCoordinatorMetadataStoreManagementPeriod()
           )
       );
 
       for (CoordinatorCustomDutyGroup customDutyGroup : customDutyGroups.getCoordinatorCustomDutyGroups()) {
         dutiesRunnables.add(
             Pair.of(
-                new DutiesRunnable(customDutyGroup.getCustomDutyList(), startingLeaderCounter, customDutyGroup.getName()),
+                new DutiesRunnable(
+                    customDutyGroup.getCustomDutyList(),
+                    startingLeaderCounter,
+                    customDutyGroup.getName()
+                ),
                 customDutyGroup.getPeriod()
             )
         );
         log.info(
             "Done making custom coordinator duties %s for group %s",
-            customDutyGroup.getCustomDutyList().stream().map(duty -> duty.getClass().getName()).collect(Collectors.toList()),
+            customDutyGroup.getCustomDutyList()
+                           .stream()
+                           .map(duty -> duty.getClass().getName())
+                           .collect(Collectors.toList()),
             customDutyGroup.getName()
         );
       }
 
       for (final Pair<? extends DutiesRunnable, Duration> dutiesRunnable : dutiesRunnables) {
-        // CompactSegmentsDuty can takes a non trival amount of time to complete.
-        // Hence, we schedule at fixed rate to make sure the other tasks still run at approximately every
-        // config.getCoordinatorIndexingPeriod() period. Note that cautious should be taken
-        // if setting config.getCoordinatorIndexingPeriod() lower than the default value.
-        ScheduledExecutors.scheduleAtFixedRate(
-            exec,
-            config.getCoordinatorStartDelay(),
-            dutiesRunnable.rhs,
-            new Callable<ScheduledExecutors.Signal>()
-            {
-              private final DutiesRunnable theRunnable = dutiesRunnable.lhs;
-
-              @Override
-              public ScheduledExecutors.Signal call()
-              {
-                if (coordLeaderSelector.isLeader() && startingLeaderCounter == coordLeaderSelector.localTerm()) {
-                  theRunnable.run();
-                }
-                if (coordLeaderSelector.isLeader()
-                    && startingLeaderCounter == coordLeaderSelector.localTerm()) { // (We might no longer be leader)
-                  return ScheduledExecutors.Signal.REPEAT;
-                } else {
-                  return ScheduledExecutors.Signal.STOP;
-                }
+        // Note that caution should be taken if setting config.getCoordinatorIndexingPeriod() lower than the default

Review comment:
       Retaining the comments from master may be preferred here

##########
File path: server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java
##########
@@ -665,68 +681,91 @@ private void becomeLeader()
       final int startingLeaderCounter = coordLeaderSelector.localTerm();
 
       final List<Pair<? extends DutiesRunnable, Duration>> dutiesRunnables = new ArrayList<>();
+      LoadRule.LoadRuleMode historicalManagementDutiesLoadRuleMode = LoadRule.LoadRuleMode.ALL;
+      if (config.isLoadPrimaryReplicantSeparately()) {
+        dutiesRunnables.add(
+            Pair.of(
+                new DutiesRunnable(
+                    makePrimaryReplicantManagementDuties(),
+                    startingLeaderCounter,
+                    PRIMARY_REPLICANT_LOADING_DUTIES_GROUP,
+                    RunRules.RunRulesMode.LOAD_RULE_ONLY,
+                    LoadRule.LoadRuleMode.PRIMARY_ONLY
+                ),
+                config.getCoordinatorPrimaryReplicantLoaderPeriod()
+            )
+        );
+        historicalManagementDutiesLoadRuleMode = LoadRule.LoadRuleMode.REPLICA_ONLY;
+      }
       dutiesRunnables.add(
           Pair.of(
-              new DutiesRunnable(makeHistoricalManagementDuties(), startingLeaderCounter, HISTORICAL_MANAGEMENT_DUTIES_DUTY_GROUP),
+              new DutiesRunnable(
+                  makeHistoricalManagementDuties(),
+                  startingLeaderCounter,
+                  HISTORICAL_MANAGEMENT_DUTIES_DUTY_GROUP,
+                  historicalManagementDutiesLoadRuleMode
+              ),
               config.getCoordinatorPeriod()
           )
       );
+      //noinspection VariableNotUsedInsideIf
       if (indexingServiceClient != null) {
         dutiesRunnables.add(
             Pair.of(
-                new DutiesRunnable(makeIndexingServiceDuties(), startingLeaderCounter, INDEXING_SERVICE_DUTIES_DUTY_GROUP),
+                new DutiesRunnable(
+                    makeIndexingServiceDuties(),
+                    startingLeaderCounter,
+                    INDEXING_SERVICE_DUTIES_DUTY_GROUP
+                ),
                 config.getCoordinatorIndexingPeriod()
             )
         );
       }
       dutiesRunnables.add(
           Pair.of(
-              new DutiesRunnable(makeMetadataStoreManagementDuties(), startingLeaderCounter, METADATA_STORE_MANAGEMENT_DUTIES_DUTY_GROUP),
+              new DutiesRunnable(
+                  makeMetadataStoreManagementDuties(),
+                  startingLeaderCounter,
+                  METADATA_STORE_MANAGEMENT_DUTIES_DUTY_GROUP
+              ),
               config.getCoordinatorMetadataStoreManagementPeriod()
           )
       );
 
       for (CoordinatorCustomDutyGroup customDutyGroup : customDutyGroups.getCoordinatorCustomDutyGroups()) {
         dutiesRunnables.add(
             Pair.of(
-                new DutiesRunnable(customDutyGroup.getCustomDutyList(), startingLeaderCounter, customDutyGroup.getName()),
+                new DutiesRunnable(
+                    customDutyGroup.getCustomDutyList(),
+                    startingLeaderCounter,
+                    customDutyGroup.getName()
+                ),
                 customDutyGroup.getPeriod()
             )
         );
         log.info(
             "Done making custom coordinator duties %s for group %s",
-            customDutyGroup.getCustomDutyList().stream().map(duty -> duty.getClass().getName()).collect(Collectors.toList()),
+            customDutyGroup.getCustomDutyList()
+                           .stream()
+                           .map(duty -> duty.getClass().getName())
+                           .collect(Collectors.toList()),
             customDutyGroup.getName()
         );
       }
 
       for (final Pair<? extends DutiesRunnable, Duration> dutiesRunnable : dutiesRunnables) {
-        // CompactSegmentsDuty can takes a non trival amount of time to complete.
-        // Hence, we schedule at fixed rate to make sure the other tasks still run at approximately every
-        // config.getCoordinatorIndexingPeriod() period. Note that cautious should be taken
-        // if setting config.getCoordinatorIndexingPeriod() lower than the default value.
-        ScheduledExecutors.scheduleAtFixedRate(
-            exec,
-            config.getCoordinatorStartDelay(),
-            dutiesRunnable.rhs,
-            new Callable<ScheduledExecutors.Signal>()
-            {
-              private final DutiesRunnable theRunnable = dutiesRunnable.lhs;
-
-              @Override
-              public ScheduledExecutors.Signal call()
-              {
-                if (coordLeaderSelector.isLeader() && startingLeaderCounter == coordLeaderSelector.localTerm()) {
-                  theRunnable.run();
-                }
-                if (coordLeaderSelector.isLeader()
-                    && startingLeaderCounter == coordLeaderSelector.localTerm()) { // (We might no longer be leader)
-                  return ScheduledExecutors.Signal.REPEAT;
-                } else {
-                  return ScheduledExecutors.Signal.STOP;
-                }
+        // Note that caution should be taken if setting config.getCoordinatorIndexingPeriod() lower than the default
+        // value.
+        exec.scheduleAtFixedRate(

Review comment:
       this block of code may need closer look. The previous code had an explicit STOP if not leader, want to make sure that is actually no longer needed with this change instead of left out on accident

##########
File path: server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java
##########
@@ -756,6 +795,15 @@ private void stopBeingLeader()
     }
   }
 
+  private List<CoordinatorDuty> makePrimaryReplicantManagementDuties()

Review comment:
       quick javadoc calling out what these duties are for could be helpful

##########
File path: server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java
##########
@@ -924,6 +1038,7 @@ public void run()
                 .setDimension(DruidMetrics.DUTY_GROUP, dutiesRunnableAlias)
                 .build("coordinator/global/time", TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - globalStart))
         );
+        log.info("Duties group %s complete", dutiesRunnableAlias);

Review comment:
       same debug level comment as above

##########
File path: server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java
##########
@@ -665,68 +681,91 @@ private void becomeLeader()
       final int startingLeaderCounter = coordLeaderSelector.localTerm();
 
       final List<Pair<? extends DutiesRunnable, Duration>> dutiesRunnables = new ArrayList<>();
+      LoadRule.LoadRuleMode historicalManagementDutiesLoadRuleMode = LoadRule.LoadRuleMode.ALL;

Review comment:
       ```suggestion
         // By default, the historical management duties runnable will handle all segment loading, regardless of primary/non-primary replicant status. However, if the operator enables primary replicant loading, this will flip to only load non-primary replicants since all primary replicants will be loaded by the dedicated duties runnable instantiated below.
         LoadRule.LoadRuleMode historicalManagementDutiesLoadRuleMode = LoadRule.LoadRuleMode.ALL;
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] danprince1 commented on a change in pull request #12013: Create a standalone primary replicant loader in the Coordinator

Posted by GitBox <gi...@apache.org>.
danprince1 commented on a change in pull request #12013:
URL: https://github.com/apache/druid/pull/12013#discussion_r776867062



##########
File path: server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java
##########
@@ -665,68 +681,91 @@ private void becomeLeader()
       final int startingLeaderCounter = coordLeaderSelector.localTerm();
 
       final List<Pair<? extends DutiesRunnable, Duration>> dutiesRunnables = new ArrayList<>();
+      LoadRule.LoadRuleMode historicalManagementDutiesLoadRuleMode = LoadRule.LoadRuleMode.ALL;
+      if (config.isLoadPrimaryReplicantSeparately()) {
+        dutiesRunnables.add(
+            Pair.of(
+                new DutiesRunnable(
+                    makePrimaryReplicantManagementDuties(),
+                    startingLeaderCounter,
+                    PRIMARY_REPLICANT_LOADING_DUTIES_GROUP,
+                    RunRules.RunRulesMode.LOAD_RULE_ONLY,
+                    LoadRule.LoadRuleMode.PRIMARY_ONLY
+                ),
+                config.getCoordinatorPrimaryReplicantLoaderPeriod()
+            )
+        );
+        historicalManagementDutiesLoadRuleMode = LoadRule.LoadRuleMode.REPLICA_ONLY;
+      }
       dutiesRunnables.add(
           Pair.of(
-              new DutiesRunnable(makeHistoricalManagementDuties(), startingLeaderCounter, HISTORICAL_MANAGEMENT_DUTIES_DUTY_GROUP),
+              new DutiesRunnable(
+                  makeHistoricalManagementDuties(),
+                  startingLeaderCounter,
+                  HISTORICAL_MANAGEMENT_DUTIES_DUTY_GROUP,
+                  historicalManagementDutiesLoadRuleMode
+              ),
               config.getCoordinatorPeriod()
           )
       );
+      //noinspection VariableNotUsedInsideIf
       if (indexingServiceClient != null) {
         dutiesRunnables.add(
             Pair.of(
-                new DutiesRunnable(makeIndexingServiceDuties(), startingLeaderCounter, INDEXING_SERVICE_DUTIES_DUTY_GROUP),
+                new DutiesRunnable(
+                    makeIndexingServiceDuties(),
+                    startingLeaderCounter,
+                    INDEXING_SERVICE_DUTIES_DUTY_GROUP
+                ),
                 config.getCoordinatorIndexingPeriod()
             )
         );
       }
       dutiesRunnables.add(
           Pair.of(
-              new DutiesRunnable(makeMetadataStoreManagementDuties(), startingLeaderCounter, METADATA_STORE_MANAGEMENT_DUTIES_DUTY_GROUP),
+              new DutiesRunnable(
+                  makeMetadataStoreManagementDuties(),
+                  startingLeaderCounter,
+                  METADATA_STORE_MANAGEMENT_DUTIES_DUTY_GROUP
+              ),
               config.getCoordinatorMetadataStoreManagementPeriod()
           )
       );
 
       for (CoordinatorCustomDutyGroup customDutyGroup : customDutyGroups.getCoordinatorCustomDutyGroups()) {
         dutiesRunnables.add(
             Pair.of(
-                new DutiesRunnable(customDutyGroup.getCustomDutyList(), startingLeaderCounter, customDutyGroup.getName()),
+                new DutiesRunnable(
+                    customDutyGroup.getCustomDutyList(),
+                    startingLeaderCounter,
+                    customDutyGroup.getName()
+                ),
                 customDutyGroup.getPeriod()
             )
         );
         log.info(
             "Done making custom coordinator duties %s for group %s",
-            customDutyGroup.getCustomDutyList().stream().map(duty -> duty.getClass().getName()).collect(Collectors.toList()),
+            customDutyGroup.getCustomDutyList()
+                           .stream()
+                           .map(duty -> duty.getClass().getName())
+                           .collect(Collectors.toList()),
             customDutyGroup.getName()
         );
       }
 
       for (final Pair<? extends DutiesRunnable, Duration> dutiesRunnable : dutiesRunnables) {
-        // CompactSegmentsDuty can takes a non trival amount of time to complete.
-        // Hence, we schedule at fixed rate to make sure the other tasks still run at approximately every
-        // config.getCoordinatorIndexingPeriod() period. Note that cautious should be taken
-        // if setting config.getCoordinatorIndexingPeriod() lower than the default value.
-        ScheduledExecutors.scheduleAtFixedRate(
-            exec,
-            config.getCoordinatorStartDelay(),
-            dutiesRunnable.rhs,
-            new Callable<ScheduledExecutors.Signal>()
-            {
-              private final DutiesRunnable theRunnable = dutiesRunnable.lhs;
-
-              @Override
-              public ScheduledExecutors.Signal call()
-              {
-                if (coordLeaderSelector.isLeader() && startingLeaderCounter == coordLeaderSelector.localTerm()) {
-                  theRunnable.run();
-                }
-                if (coordLeaderSelector.isLeader()
-                    && startingLeaderCounter == coordLeaderSelector.localTerm()) { // (We might no longer be leader)
-                  return ScheduledExecutors.Signal.REPEAT;
-                } else {
-                  return ScheduledExecutors.Signal.STOP;
-                }
+        // Note that caution should be taken if setting config.getCoordinatorIndexingPeriod() lower than the default
+        // value.
+        exec.scheduleAtFixedRate(

Review comment:
       Yeah, this is a good point, but I think we're OK.  This code used to use `ScheduledExecutors.scheduleAtFixedRate()`, which is a home-grown version of the jvm `ScheduledExecutorService.scheduleAtFixedRate()`.  I found that the druid version would often start multiple versions of the same Runnable at a time, whereas the jvm version [guarantees](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ScheduledExecutorService.html#scheduleAtFixedRate-java.lang.Runnable-long-long-java.util.concurrent.TimeUnit-) that this won't happen.
   
   Yes, the druid version has the 'signaling' stuff which keeps it from running again if it's not the leader, but we also check for that in the beginning of `DruidCoordinator.run()` anyway, so it seems safe.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] danprince1 commented on a change in pull request #12013: Create a standalone primary replicant loader in the Coordinator

Posted by GitBox <gi...@apache.org>.
danprince1 commented on a change in pull request #12013:
URL: https://github.com/apache/druid/pull/12013#discussion_r776843147



##########
File path: server/src/test/java/org/apache/druid/server/http/CoordinatorDynamicConfigTest.java
##########
@@ -272,7 +273,7 @@ public void testDecommissioningParametersBackwardCompatibility() throws Exceptio
         0,
         false,
         false,
-        Integer.MAX_VALUE
+        0

Review comment:
       Previously these tests were just assigning the default value to `maxNonPrimaryReplicantsToLoad`.  I don't remember, but I think I must have changed these to specify a non-default value in order to check that deserializing was actually working.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] danprince1 commented on a change in pull request #12013: Create a standalone primary replicant loader in the Coordinator

Posted by GitBox <gi...@apache.org>.
danprince1 commented on a change in pull request #12013:
URL: https://github.com/apache/druid/pull/12013#discussion_r776803870



##########
File path: docs/configuration/index.md
##########
@@ -781,7 +781,10 @@ These Coordinator static configurations can be defined in the `coordinator/runti
 
 |Property|Description|Default|
 |--------|-----------|-------|
+|`druid.coordinator.dutiesRunnableExecutor.threadPoolSize`|The number of threads used by the `ScheduledExecutorService` that executes `DutiesRunnable` objects|1|
 |`druid.coordinator.period`|The run period for the Coordinator. The Coordinator operates by maintaining the current state of the world in memory and periodically looking at the set of "used" segments and segments being served to make decisions about whether any changes need to be made to the data topology. This property sets the delay between each of these runs.|PT60S|
+|`druid.coordinator.loadPrimaryReplicantSeparately`|Flag that indicates if the Coordinator should put primary replicant loading for used segments on its own event loop. This will require more memory and CPU to be used by the coordinator, but will ensure that the loading of primary replicas for used segments is not blocked by other coordinator jobs. The default of false is likely fine for most clusters. Enabling it could make sense if you regularly have to wait longer than you'd like for coordination cycles to complete before primary replicants are loaded for used segments that are not being served by the cluster.|false|

Review comment:
       The idea of validating the config is a good one, and I went down a rabbit hole investigating how configurations are validated elsewhere in the druid code.  There are two schemes in use in the codebase for getting config properties deserialized into java objects.  The first uses the [skife](https://github.com/brianm/config-magic) library, and the second is a home-grown class that integrates validation annotations (see `JsonConfigurator`).  Apparently there is a (very slow) movement under way to standardize on the second method (see #920).  Unfortunately `DruidCoordinatorConfig` still uses the old way, which doesn't provide any validation.  So, the 'right' way to fix this would be to migrate coordinator config to the new way and add validation.  This would also require a custom validator, as none of the existing ones cover the relationship between two different properties.
   
   For now I'll add the doc and a simple check in DruidCoordinator.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org