You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@helix.apache.org by GitBox <gi...@apache.org> on 2020/08/27 22:09:26 UTC

[GitHub] [helix] NealSun96 opened a new pull request #1326: Task Framework IdealState Removal

NealSun96 opened a new pull request #1326:
URL: https://github.com/apache/helix/pull/1326


   ### Issues
   
   - [x] My PR addresses the following Helix issues and references them in the PR description:
   
   #1323, #1324, #1325 
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI changes:
   
   This PR removes IdealState usage from the task framework pipeline. Instead, now workflow resources are created directly using WorkflowConfig and JobConfig. As a result, the legacy pipeline logic in TaskSchedulingStage is removed: it was there for the case of "IdealState exists but not WorkflowConfig", which is no longer possible now. 
   
   After the removal of legacy pipeline logic, numerous unintended usage of the legacy pipeline were uncovered in the form of broken tests; these tests are addressed and fixed. 
   
   At the same time, 2 bugs in the pipeline were also uncovered and fixed: the first one is about tasks being incorrectly rejected; the second one is about negative scheduling delay not properly rejected. 
   
   ### Tests
   
   - [x] The following is the result of the "mvn test" command on the appropriate module:
   
   ```
   [ERROR] Tests run: 1173, Failures: 2, Errors: 0, Skipped: 1, Time elapsed: 4,319.525 s <<< FAILURE! - in TestSuite
   [ERROR] testEnableCompressionResource(org.apache.helix.integration.TestEnableCompression)  Time elapsed: 162.594 s  <<< FAILURE!
   java.lang.AssertionError: expected:<true> but was:<false>
           at org.apache.helix.integration.TestEnableCompression.testEnableCompressionResource(TestEnableCompression.java:117)
   
   [ERROR] testPeriodicRefresh(org.apache.helix.integration.spectator.TestRoutingTableProviderPeriodicRefresh)  Time elapsed: 2.012 s  <<< FAILURE!
   java.lang.AssertionError: expected:<4> but was:<3>
           at org.apache.helix.integration.spectator.TestRoutingTableProviderPeriodicRefresh.testPeriodicRefresh(TestRoutingTableProviderPeriodicRefresh.java:211)
   
   [INFO] 
   [INFO] Results:
   [INFO] 
   [ERROR] Failures: 
   [ERROR]   TestEnableCompression.testEnableCompressionResource:117 expected:<true> but was:<false>
   [ERROR]   TestRoutingTableProviderPeriodicRefresh.testPeriodicRefresh:211 expected:<4> but was:<3>
   [INFO] 
   [ERROR] Tests run: 1173, Failures: 2, Errors: 0, Skipped: 1
   [INFO] 
   [INFO] ------------------------------------------------------------------------
   [INFO] BUILD FAILURE
   [INFO] ------------------------------------------------------------------------
   [INFO] Total time:  01:12 h
   [INFO] Finished at: 2020-08-27T13:06:45-07:00
   [INFO] ------------------------------------------------------------------------
   ```
   Rerun
   ```
   [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.334 s - in TestSuite
   [INFO] 
   [INFO] Results:
   [INFO] 
   [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0
   [INFO] 
   [INFO] ------------------------------------------------------------------------
   [INFO] BUILD SUCCESS
   [INFO] ------------------------------------------------------------------------
   [INFO] Total time:  35.472 s
   [INFO] Finished at: 2020-08-27T14:42:52-07:00
   [INFO] ------------------------------------------------------------------------
   ```
   
   (Link the GitHub wiki you added)
   
   ### Commits
   
   - My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Code Quality
   
   - My diff has been formatted using helix-style.xml 
   (helix-style-intellij.xml if IntelliJ IDE is used)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] jiajunwang commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

jiajunwang commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r487255601



##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       As we discussed, please add a TODO there for the future change. For now, it is OK to be left here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r478726579



##########
File path: helix-core/src/main/java/org/apache/helix/task/WorkflowDispatcher.java
##########
@@ -84,6 +84,7 @@ public void updateWorkflowStatus(String workflow, WorkflowConfig workflowCfg,
     TargetState targetState = workflowCfg.getTargetState();
     if (targetState == TargetState.DELETE) {
       LOG.info("Workflow is marked as deleted " + workflow + " cleaning up the workflow context.");
+      updateInflightJobs(workflow, workflowCtx, currentStateOutput, bestPossibleOutput);

Review comment:
       All `updateInflightJobs` are making up for a case that wasn't covered before: any time a workflow exits this function before inflight jobs are handled, the inflight jobs will not be processed and will fallback to the legacy pipeline logic. This isn't correct as inflight jobs need to respond to workflow states such as TimedOut. Therefore, the new logic is to handle inflight jobs before exiting this function. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483253636



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -59,6 +62,7 @@ public void process(ClusterEvent event) throws Exception {
 
     Map<String, Resource> resourceMap = new LinkedHashMap<>();
     Map<String, Resource> resourceToRebalance = new LinkedHashMap<>();
+    Map<String, Resource> taskResourcesToDrop = new LinkedHashMap<>();

Review comment:
       Good catch, probably not. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] alirezazamani commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

alirezazamani commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r482212002



##########
File path: helix-core/src/main/java/org/apache/helix/task/WorkflowDispatcher.java
##########
@@ -290,7 +297,6 @@ private void scheduleJobs(String workflow, WorkflowConfig workflowCfg,
         // Time is not ready. Set a trigger and update the start time.
         // Check if the job is ready to be executed.
         if (System.currentTimeMillis() >= workflowCtx.getJobStartTime(job)) {
-          scheduleSingleJob(job, jobConfig);

Review comment:
       Can remind me why scheduleSignleJob is removed? We already schedule the job in other places?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r484025032



##########
File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/RebalanceScheduler.java
##########
@@ -80,6 +80,7 @@ public void scheduleRebalance(HelixManager manager, String resource, long startT
     long delay = startTime - System.currentTimeMillis();
     if (delay < 0) {
       LOG.debug(String.format("Delay time is %s, will not be scheduled", delay));
+      return;

Review comment:
       The initial motivation was that 1 test case is failing. After digging around, it turns out that 1 negative delay was passed into the function, which removes a scheduled rebalance at a later time. A part of the fix was that the negative delay shouldn't be passed (it was unintentional), and the other part of the fix was #1325. 
   
   I'm unable to produce any errors now that negative delays are not passed, so I can no longer provide such a scenario. 
   
   #1325 should either be resolved by rejecting negative delay or fixing the description and expectation of this function. For now I'll remove it from this PR. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] narendly commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

narendly commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483232917



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +101,43 @@ public void process(ClusterEvent event) throws Exception {
       }
     }
 
+    // Add TaskFramework resources from workflow and job configs as Task Framework will no longer
+    // use IdealState
+    if (isTaskCache) {
+      WorkflowControllerDataProvider taskDataCache =
+          event.getAttribute(AttributeName.ControllerDataProvider.name());
+      for (Map.Entry<String, WorkflowConfig> workflowConfigEntry : taskDataCache
+          .getWorkflowConfigMap().entrySet()) {
+        // always overwrite, because the resource could be created by IS

Review comment:
       Nit: the resource could "have been created" based on IS? Since we aren't doing IS-based task resource creation anymore.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] jiajunwang commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

jiajunwang commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483345696



##########
File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/RebalanceScheduler.java
##########
@@ -80,6 +80,7 @@ public void scheduleRebalance(HelixManager manager, String resource, long startT
     long delay = startTime - System.currentTimeMillis();
     if (delay < 0) {
       LOG.debug(String.format("Delay time is %s, will not be scheduled", delay));
+      return;

Review comment:
       1. If this change is not absolutely necessary for this PR, let's split.
   
   2. Does it really cause any trouble for us? We should do the right thing, not what has been written down as a comment.
   The potential problem I see here is that, as you mentioned, the possible gap between the timestamp generated and the time when this method starts processing. The delayed rebalancer has this potential issue if you return prematurely. So if you want to change it, then you need to "fix" the delayed rebalance. But back to the first point, I don't think it is a problem and the comment is the only thing we shall fix.
   
   Anyway, please convince me with a scenario that without return, our logic will fail. Otherwise, #1325 does not look like an issue to me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] jiajunwang commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

jiajunwang commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r486786354



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       How about "isTaskCache == TaskConstants.STATE_MODEL_NAME.equals(resource.getStateModelDefRef())"
   You can comment that we are trying to match for 2 conditions:
   1. If resource state model def is null, then we only proceed if it is a regular resource pipeline.
   2. If resource state model def is not null, then we only proceed if the pipeline type matches the resource state model type.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +102,15 @@ public void process(ClusterEvent event) throws Exception {
       }
     }

Review comment:
       Yeah, that's not too bad.
   I was thinking if the RESOURCES attribute is still necessary. But might be. For example, for the targeted jobs. So let's keep it for now.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       Then, can we add scheduleOnDemandPipeline into cleanupJobIdealStateExtView() ? Otherwise, cleanupJobIdealStateExtView is not a complete call that triggers the expected rebalance, right?
   It also helps to simplify code, IMO.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Not a must, but we did add a IS to ResourceConfig conversion method. It is ResourceConfig.mergeIdealStateWithResourceConfig(). So in theory, you can keep only this method by converting the IS into ResourceConfig, then use this method to add to the resource map.
   1. This helps to reduce duplicate code.
   2. ResourceConfig is the one we are going to use in the near future. IS will be transformed for controller output only. So that logic will be replaced anyway.
   
   Up to you if you want to do it now or in the future. If in the future, please add a TODO here.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Sure, that also makes sense.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       This is necessary. So no matter who does the cleanup later, he or she won't forget to remove the scheduleOnDemandPipeline(). High-level, these 2 logics are bundled. And we shall not assume everyone remembers this relationship by their memory or document. Organize the code to ensure it.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       As we discussed, please add a TODO there for the future change. For now, it is OK to be left here.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       How about "isTaskCache == TaskConstants.STATE_MODEL_NAME.equals(resource.getStateModelDefRef())"
   You can comment that we are trying to match for 2 conditions:
   1. If resource state model def is null, then we only proceed if it is a regular resource pipeline.
   2. If resource state model def is not null, then we only proceed if the pipeline type matches the resource state model type.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +102,15 @@ public void process(ClusterEvent event) throws Exception {
       }
     }

Review comment:
       Yeah, that's not too bad.
   I was thinking if the RESOURCES attribute is still necessary. But might be. For example, for the targeted jobs. So let's keep it for now.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       Then, can we add scheduleOnDemandPipeline into cleanupJobIdealStateExtView() ? Otherwise, cleanupJobIdealStateExtView is not a complete call that triggers the expected rebalance, right?
   It also helps to simplify code, IMO.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Not a must, but we did add a IS to ResourceConfig conversion method. It is ResourceConfig.mergeIdealStateWithResourceConfig(). So in theory, you can keep only this method by converting the IS into ResourceConfig, then use this method to add to the resource map.
   1. This helps to reduce duplicate code.
   2. ResourceConfig is the one we are going to use in the near future. IS will be transformed for controller output only. So that logic will be replaced anyway.
   
   Up to you if you want to do it now or in the future. If in the future, please add a TODO here.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Sure, that also makes sense.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       This is necessary. So no matter who does the cleanup later, he or she won't forget to remove the scheduleOnDemandPipeline(). High-level, these 2 logics are bundled. And we shall not assume everyone remembers this relationship by their memory or document. Organize the code to ensure it.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       As we discussed, please add a TODO there for the future change. For now, it is OK to be left here.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       How about "isTaskCache == TaskConstants.STATE_MODEL_NAME.equals(resource.getStateModelDefRef())"
   You can comment that we are trying to match for 2 conditions:
   1. If resource state model def is null, then we only proceed if it is a regular resource pipeline.
   2. If resource state model def is not null, then we only proceed if the pipeline type matches the resource state model type.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +102,15 @@ public void process(ClusterEvent event) throws Exception {
       }
     }

Review comment:
       Yeah, that's not too bad.
   I was thinking if the RESOURCES attribute is still necessary. But might be. For example, for the targeted jobs. So let's keep it for now.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       Then, can we add scheduleOnDemandPipeline into cleanupJobIdealStateExtView() ? Otherwise, cleanupJobIdealStateExtView is not a complete call that triggers the expected rebalance, right?
   It also helps to simplify code, IMO.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Not a must, but we did add a IS to ResourceConfig conversion method. It is ResourceConfig.mergeIdealStateWithResourceConfig(). So in theory, you can keep only this method by converting the IS into ResourceConfig, then use this method to add to the resource map.
   1. This helps to reduce duplicate code.
   2. ResourceConfig is the one we are going to use in the near future. IS will be transformed for controller output only. So that logic will be replaced anyway.
   
   Up to you if you want to do it now or in the future. If in the future, please add a TODO here.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Sure, that also makes sense.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       This is necessary. So no matter who does the cleanup later, he or she won't forget to remove the scheduleOnDemandPipeline(). High-level, these 2 logics are bundled. And we shall not assume everyone remembers this relationship by their memory or document. Organize the code to ensure it.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       As we discussed, please add a TODO there for the future change. For now, it is OK to be left here.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       How about "isTaskCache == TaskConstants.STATE_MODEL_NAME.equals(resource.getStateModelDefRef())"
   You can comment that we are trying to match for 2 conditions:
   1. If resource state model def is null, then we only proceed if it is a regular resource pipeline.
   2. If resource state model def is not null, then we only proceed if the pipeline type matches the resource state model type.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +102,15 @@ public void process(ClusterEvent event) throws Exception {
       }
     }

Review comment:
       Yeah, that's not too bad.
   I was thinking if the RESOURCES attribute is still necessary. But might be. For example, for the targeted jobs. So let's keep it for now.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       Then, can we add scheduleOnDemandPipeline into cleanupJobIdealStateExtView() ? Otherwise, cleanupJobIdealStateExtView is not a complete call that triggers the expected rebalance, right?
   It also helps to simplify code, IMO.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Not a must, but we did add a IS to ResourceConfig conversion method. It is ResourceConfig.mergeIdealStateWithResourceConfig(). So in theory, you can keep only this method by converting the IS into ResourceConfig, then use this method to add to the resource map.
   1. This helps to reduce duplicate code.
   2. ResourceConfig is the one we are going to use in the near future. IS will be transformed for controller output only. So that logic will be replaced anyway.
   
   Up to you if you want to do it now or in the future. If in the future, please add a TODO here.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Sure, that also makes sense.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       This is necessary. So no matter who does the cleanup later, he or she won't forget to remove the scheduleOnDemandPipeline(). High-level, these 2 logics are bundled. And we shall not assume everyone remembers this relationship by their memory or document. Organize the code to ensure it.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       As we discussed, please add a TODO there for the future change. For now, it is OK to be left here.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       How about "isTaskCache == TaskConstants.STATE_MODEL_NAME.equals(resource.getStateModelDefRef())"
   You can comment that we are trying to match for 2 conditions:
   1. If resource state model def is null, then we only proceed if it is a regular resource pipeline.
   2. If resource state model def is not null, then we only proceed if the pipeline type matches the resource state model type.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +102,15 @@ public void process(ClusterEvent event) throws Exception {
       }
     }

Review comment:
       Yeah, that's not too bad.
   I was thinking if the RESOURCES attribute is still necessary. But might be. For example, for the targeted jobs. So let's keep it for now.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       Then, can we add scheduleOnDemandPipeline into cleanupJobIdealStateExtView() ? Otherwise, cleanupJobIdealStateExtView is not a complete call that triggers the expected rebalance, right?
   It also helps to simplify code, IMO.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Not a must, but we did add a IS to ResourceConfig conversion method. It is ResourceConfig.mergeIdealStateWithResourceConfig(). So in theory, you can keep only this method by converting the IS into ResourceConfig, then use this method to add to the resource map.
   1. This helps to reduce duplicate code.
   2. ResourceConfig is the one we are going to use in the near future. IS will be transformed for controller output only. So that logic will be replaced anyway.
   
   Up to you if you want to do it now or in the future. If in the future, please add a TODO here.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Sure, that also makes sense.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       This is necessary. So no matter who does the cleanup later, he or she won't forget to remove the scheduleOnDemandPipeline(). High-level, these 2 logics are bundled. And we shall not assume everyone remembers this relationship by their memory or document. Organize the code to ensure it.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       As we discussed, please add a TODO there for the future change. For now, it is OK to be left here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483278130



##########
File path: helix-core/src/main/java/org/apache/helix/task/WorkflowDispatcher.java
##########
@@ -322,64 +328,6 @@ private void processJob(String job, CurrentStateOutput currentStateOutput,
     }
   }
 
-  /**
-   * Posts new job to cluster
-   */
-  private void scheduleSingleJob(String jobResource, JobConfig jobConfig) {

Review comment:
       :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r487200442



##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase (it could trigger rebalance if the participant side is not updated, and it wouldn't trigger rebalance if the participant side is updated). I suppose we can put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] narendly commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

narendly commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483234556



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +101,43 @@ public void process(ClusterEvent event) throws Exception {
       }
     }
 
+    // Add TaskFramework resources from workflow and job configs as Task Framework will no longer
+    // use IdealState
+    if (isTaskCache) {
+      WorkflowControllerDataProvider taskDataCache =
+          event.getAttribute(AttributeName.ControllerDataProvider.name());
+      for (Map.Entry<String, WorkflowConfig> workflowConfigEntry : taskDataCache
+          .getWorkflowConfigMap().entrySet()) {
+        // always overwrite, because the resource could be created by IS
+        String resourceName = workflowConfigEntry.getKey();
+        WorkflowConfig workflowConfig = workflowConfigEntry.getValue();
+        addResourceConfigToResourceMap(resourceName, workflowConfig, cache.getClusterConfig(),
+            resourceMap, resourceToRebalance);
+        addPartition(resourceName, resourceName, resourceMap);
+      }
+
+      for (Map.Entry<String, JobConfig> jobConfigEntry : taskDataCache.getJobConfigMap()
+          .entrySet()) {
+        // always overwrite, because the resource could be created by IS
+        String resourceName = jobConfigEntry.getKey();
+        JobConfig jobConfig = jobConfigEntry.getValue();
+        addResourceConfigToResourceMap(resourceName, jobConfig, cache.getClusterConfig(),
+            resourceMap, resourceToRebalance);
+        int numPartitions = jobConfig.getTaskConfigMap().size();
+        if (numPartitions == 0 && idealStates != null) {
+          IdealState targetIs = idealStates.get(jobConfig.getTargetResource());
+          if (targetIs == null) {
+            LOG.warn("Target resource does not exist for job " + resourceName);

Review comment:
       Probably more helpful if you included the name of the target resource.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483273500



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +232,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,

Review comment:
       This PR doesn't change any of the map populating behavior: if a workflow/job was added to both maps in the past (based on IS), now they are added to both maps based on configs. With that said, your assessment on `resourceToRebalance` isn't entirely accurate: when the pipeline is for tasks, `resourceToRebalance` contains all the task resources as well. 
   
   The reason why we have both maps is because of the usages in other stages. For example, in `CurrentStateComputationStage`, task resources need to be present in both maps. There's definitely room for improvement in this aspect, but that would perhaps be in another PR. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] dasahcc commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

dasahcc commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r485111397



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +101,43 @@ public void process(ClusterEvent event) throws Exception {
       }
     }
 
+    // Add TaskFramework resources from workflow and job configs as Task Framework will no longer
+    // use IdealState
+    if (isTaskCache) {

Review comment:
       No. We should not change it here but in TaskSchedulingStage. We do not need this specific process to select workflows. All we need is to get WorkflowConfig list in Cache at TaskSchedulingStage and loop it. These operations are unnecessary.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r487197712



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       That is interesting. However, the next step of Task Framework will lead to the removal of this section of ResourceComputationStage - it no longer needs to create Resources from WorkflowConfigs/JobConfigs. (Check here https://github.com/apache/helix/wiki/Task-Framework-IdealState-Dependency-Removal-Progression) 
   
   I will put a TODO here in case the next step doesn't happen. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] alirezazamani commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

alirezazamani commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r485114449



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +101,43 @@ public void process(ClusterEvent event) throws Exception {
       }
     }
 
+    // Add TaskFramework resources from workflow and job configs as Task Framework will no longer
+    // use IdealState
+    if (isTaskCache) {

Review comment:
       I think we need it here because of currentState computation stage. If we skip the resources here, we will miss the pending messages and currentStates. This can cause issue for TaskScheduling Stage.
   @NealSun96 could you please confirm it. Thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] jiajunwang commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

jiajunwang commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r486786354



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       How about "isTaskCache == TaskConstants.STATE_MODEL_NAME.equals(resource.getStateModelDefRef())"
   You can comment that we are trying to match for 2 conditions:
   1. If resource state model def is null, then we only proceed if it is a regular resource pipeline.
   2. If resource state model def is not null, then we only proceed if the pipeline type matches the resource state model type.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +102,15 @@ public void process(ClusterEvent event) throws Exception {
       }
     }

Review comment:
       Yeah, that's not too bad.
   I was thinking if the RESOURCES attribute is still necessary. But might be. For example, for the targeted jobs. So let's keep it for now.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       Then, can we add scheduleOnDemandPipeline into cleanupJobIdealStateExtView() ? Otherwise, cleanupJobIdealStateExtView is not a complete call that triggers the expected rebalance, right?
   It also helps to simplify code, IMO.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Not a must, but we did add a IS to ResourceConfig conversion method. It is ResourceConfig.mergeIdealStateWithResourceConfig(). So in theory, you can keep only this method by converting the IS into ResourceConfig, then use this method to add to the resource map.
   1. This helps to reduce duplicate code.
   2. ResourceConfig is the one we are going to use in the near future. IS will be transformed for controller output only. So that logic will be replaced anyway.
   
   Up to you if you want to do it now or in the future. If in the future, please add a TODO here.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Sure, that also makes sense.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       This is necessary. So no matter who does the cleanup later, he or she won't forget to remove the scheduleOnDemandPipeline(). High-level, these 2 logics are bundled. And we shall not assume everyone remembers this relationship by their memory or document. Organize the code to ensure it.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       As we discussed, please add a TODO there for the future change. For now, it is OK to be left here.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       How about "isTaskCache == TaskConstants.STATE_MODEL_NAME.equals(resource.getStateModelDefRef())"
   You can comment that we are trying to match for 2 conditions:
   1. If resource state model def is null, then we only proceed if it is a regular resource pipeline.
   2. If resource state model def is not null, then we only proceed if the pipeline type matches the resource state model type.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +102,15 @@ public void process(ClusterEvent event) throws Exception {
       }
     }

Review comment:
       Yeah, that's not too bad.
   I was thinking if the RESOURCES attribute is still necessary. But might be. For example, for the targeted jobs. So let's keep it for now.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       Then, can we add scheduleOnDemandPipeline into cleanupJobIdealStateExtView() ? Otherwise, cleanupJobIdealStateExtView is not a complete call that triggers the expected rebalance, right?
   It also helps to simplify code, IMO.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Not a must, but we did add a IS to ResourceConfig conversion method. It is ResourceConfig.mergeIdealStateWithResourceConfig(). So in theory, you can keep only this method by converting the IS into ResourceConfig, then use this method to add to the resource map.
   1. This helps to reduce duplicate code.
   2. ResourceConfig is the one we are going to use in the near future. IS will be transformed for controller output only. So that logic will be replaced anyway.
   
   Up to you if you want to do it now or in the future. If in the future, please add a TODO here.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       How about "isTaskCache == TaskConstants.STATE_MODEL_NAME.equals(resource.getStateModelDefRef())"
   You can comment that we are trying to match for 2 conditions:
   1. If resource state model def is null, then we only proceed if it is a regular resource pipeline.
   2. If resource state model def is not null, then we only proceed if the pipeline type matches the resource state model type.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +102,15 @@ public void process(ClusterEvent event) throws Exception {
       }
     }

Review comment:
       Yeah, that's not too bad.
   I was thinking if the RESOURCES attribute is still necessary. But might be. For example, for the targeted jobs. So let's keep it for now.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       Then, can we add scheduleOnDemandPipeline into cleanupJobIdealStateExtView() ? Otherwise, cleanupJobIdealStateExtView is not a complete call that triggers the expected rebalance, right?
   It also helps to simplify code, IMO.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Not a must, but we did add a IS to ResourceConfig conversion method. It is ResourceConfig.mergeIdealStateWithResourceConfig(). So in theory, you can keep only this method by converting the IS into ResourceConfig, then use this method to add to the resource map.
   1. This helps to reduce duplicate code.
   2. ResourceConfig is the one we are going to use in the near future. IS will be transformed for controller output only. So that logic will be replaced anyway.
   
   Up to you if you want to do it now or in the future. If in the future, please add a TODO here.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Sure, that also makes sense.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       This is necessary. So no matter who does the cleanup later, he or she won't forget to remove the scheduleOnDemandPipeline(). High-level, these 2 logics are bundled. And we shall not assume everyone remembers this relationship by their memory or document. Organize the code to ensure it.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       As we discussed, please add a TODO there for the future change. For now, it is OK to be left here.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       How about "isTaskCache == TaskConstants.STATE_MODEL_NAME.equals(resource.getStateModelDefRef())"
   You can comment that we are trying to match for 2 conditions:
   1. If resource state model def is null, then we only proceed if it is a regular resource pipeline.
   2. If resource state model def is not null, then we only proceed if the pipeline type matches the resource state model type.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +102,15 @@ public void process(ClusterEvent event) throws Exception {
       }
     }

Review comment:
       Yeah, that's not too bad.
   I was thinking if the RESOURCES attribute is still necessary. But might be. For example, for the targeted jobs. So let's keep it for now.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       Then, can we add scheduleOnDemandPipeline into cleanupJobIdealStateExtView() ? Otherwise, cleanupJobIdealStateExtView is not a complete call that triggers the expected rebalance, right?
   It also helps to simplify code, IMO.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Not a must, but we did add a IS to ResourceConfig conversion method. It is ResourceConfig.mergeIdealStateWithResourceConfig(). So in theory, you can keep only this method by converting the IS into ResourceConfig, then use this method to add to the resource map.
   1. This helps to reduce duplicate code.
   2. ResourceConfig is the one we are going to use in the near future. IS will be transformed for controller output only. So that logic will be replaced anyway.
   
   Up to you if you want to do it now or in the future. If in the future, please add a TODO here.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Sure, that also makes sense.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       This is necessary. So no matter who does the cleanup later, he or she won't forget to remove the scheduleOnDemandPipeline(). High-level, these 2 logics are bundled. And we shall not assume everyone remembers this relationship by their memory or document. Organize the code to ensure it.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       As we discussed, please add a TODO there for the future change. For now, it is OK to be left here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r484025771



##########
File path: helix-core/src/main/java/org/apache/helix/task/JobDispatcher.java
##########
@@ -94,6 +95,8 @@ public ResourceAssignment processJobStatusUpdateAndAssignment(String jobName,
           workflowResource, jobName, workflowState, jobState));
       finishJobInRuntimeJobDag(_dataProvider.getTaskDataCache(), workflowResource, jobName);
       TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobName);
+      // New pipeline trigger for workflow status update

Review comment:
       It's about how workflow needs another run to handle "all jobs completed", for example. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r481367452



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +101,43 @@ public void process(ClusterEvent event) throws Exception {
       }
     }
 
+    // Add TaskFramework resources from workflow and job configs as Task Framework will no longer
+    // use IdealState
+    if (isTaskCache) {

Review comment:
       This logic is how we read WorkflowConfig and JobConfig to create resources in the pipeline. After removing IdealStates this is the only source for workflow resources. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] narendly commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

narendly commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483275335



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +232,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,

Review comment:
       > With that said, your assessment on `resourceToRebalance` isn't entirely accurate: when the pipeline is for tasks, `resourceToRebalance` contains all the task resources as well.
   
   Could you pinpoint what's amiss here? Do you mean to say that for the resource pipeline, resourceToRebalance also contains task resources? For the task pipeline, should resourceMap contain all resources and resourceToRebalance contain only task resources?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] alirezazamani commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

alirezazamani commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r480375367



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/task/TaskSchedulingStage.java
##########
@@ -110,117 +103,9 @@ private BestPossibleStateOutput compute(ClusterEvent event, Map<String, Resource
       restOfResources.remove(jobName);
     }
 
-    // Current rest of resources including: only current state left over ones
-    // Original resource map contains workflows + jobs + other invalid resources
-    // After removing workflows + jobs, only leftover ones will go over old rebalance pipeline.
-    for (Resource resource : restOfResources.values()) {

Review comment:
       I think previously, when currentState exist and configs are missing, we were relying on this legacy code to remove the currentState (DROP the task). However, since you removed this part, how can we ensure that functionality?

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -518,7 +519,7 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     }
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
-    TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       Maybe you can add comment for onDemand rebalances and point out why you add this line? Something like making sure next pipeline runs and context get updated maybe?

##########
File path: helix-core/src/main/java/org/apache/helix/task/JobDispatcher.java
##########
@@ -85,15 +86,15 @@ public ResourceAssignment processJobStatusUpdateAndAssignment(String jobName,
     // completed)
     TaskState workflowState = workflowCtx.getWorkflowState();
     TaskState jobState = workflowCtx.getJobState(jobName);
-    // The job is already in a final state (completed/failed).
+    // Do not include workflowState == TIMED_OUT here, as later logic needs to handle this case

Review comment:
       I am assuming this is not related to this PR? Anyways it is fine to have this comment, I just want to make sure I understand it correctly.

##########
File path: helix-core/src/main/java/org/apache/helix/task/TaskDriver.java
##########
@@ -746,13 +705,11 @@ public void deleteAndWaitForCompletion(String workflow, long timeout)
     BaseDataAccessor baseDataAccessor = _accessor.getBaseDataAccessor();
     PropertyKey.Builder keyBuilder = _accessor.keyBuilder();
 
-    String idealStatePath = keyBuilder.idealStates(workflow).getPath();
     String workflowConfigPath = keyBuilder.resourceConfig(workflow).getPath();
     String workflowContextPath = keyBuilder.workflowContext(workflow).getPath();
 
     while (System.currentTimeMillis() <= endTime) {
-      if (baseDataAccessor.exists(idealStatePath, AccessOption.PERSISTENT)
-          || baseDataAccessor.exists(workflowConfigPath, AccessOption.PERSISTENT)
+      if (baseDataAccessor.exists(workflowConfigPath, AccessOption.PERSISTENT)

Review comment:
       Can we keep TaskDriver to clean up IS for a while? What would happen to the workflows/jobs that are running if we switch to new controller? Are these IS are getting deleted anyways? What do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483273500



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +232,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,

Review comment:
       This PR doesn't change any of the map populating behavior: if a workflow/job was added to both maps in the past (based on IS), now they are added to both maps based on configs. -[With that said, your assessment on `resourceToRebalance` isn't entirely accurate: when the pipeline is for tasks, `resourceToRebalance` contains all the task resources as well.]
   
   The reason why we have both maps is because of the usages in other stages. For example, in `CurrentStateComputationStage`, task resources need to be present in both maps. There's definitely room for improvement in this aspect, but that would perhaps be in another PR. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] jiajunwang commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

jiajunwang commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483347389



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -74,9 +79,8 @@ public void process(ClusterEvent event) throws Exception {
               cache.getResourceConfig(resourceName));
           resourceMap.put(resourceName, resource);
 
-          if (!idealState.isValid() && !isTaskCache
-              || idealState.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-              || !idealState.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && !isTaskCache) {
+          if (!isTaskCache && (!idealState.isValid() || !idealState.getStateModelDefRef()

Review comment:
       Humm... this condition was not readable, even the new one is confusing. Could you please add a comment here for future readers?

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/AttributeName.java
##########
@@ -22,6 +22,7 @@
 public enum AttributeName {
   RESOURCES,
   RESOURCES_TO_REBALANCE,
+  TASK_RESOURCES_TO_DROP,

Review comment:
       Is it possible that we just check in the task dispatch stage to remove all the resources in the RESOURCES_TO_REBALANCE but with no ideal state?
   So we don't need this attribute.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +102,15 @@ public void process(ClusterEvent event) throws Exception {
       }
     }
 
+    // Add TaskFramework resources from workflow and job configs as Task Framework will no longer
+    // use IdealState
+    if (isTaskCache) {
+      WorkflowControllerDataProvider taskDataCache =
+          event.getAttribute(AttributeName.ControllerDataProvider.name());
+      processWorkflowConfigs(taskDataCache, resourceMap, resourceToRebalance);
+      processJobConfigs(taskDataCache, resourceMap, resourceToRebalance, idealStates);
+    }
+
     // It's important to get partitions from CurrentState as well since the
     // idealState might be removed.
     Map<String, LiveInstance> availableInstances = cache.getLiveInstances();

Review comment:
       Suggest moving the following logic of processing CS to a separate method.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +102,15 @@ public void process(ClusterEvent event) throws Exception {
       }
     }

Review comment:
       It's hard to comment with github review, but with your change, I think the whole logic is cleaner. It would either read IdealState for regular resources or read Task config for task resources.
   In this case, could you please consider the following changes,
   1. Create a TaskResourceComputationStage, because we the reuse logic has been reduced quite a lot. It is no longer reasonable to put all logic in one class.
   2. If another class is too much, we should at least make the workflow clean to fit the following structure:
   if (taskCache) {
     // process TF Resource Objects...
   } else {
     // process Regular Resource Objects...
   }
   // Read the current state to backfill any resources that need to be removed.

##########
File path: helix-core/src/main/java/org/apache/helix/task/JobDispatcher.java
##########
@@ -94,6 +95,8 @@ public ResourceAssignment processJobStatusUpdateAndAssignment(String jobName,
           workflowResource, jobName, workflowState, jobState));
       finishJobInRuntimeJobDag(_dataProvider.getTaskDataCache(), workflowResource, jobName);
       TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobName);
+      // New pipeline trigger for workflow status update

Review comment:
       nit, "job status update"?

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       2 questions.
   1. I assume the IS change will trigger the pipeline. But obviously, if the TF logic says goodbye to IS, then this won't trigger the right pipeline anymore.
   2. In this case, shall we just listen to the context change? Depends on onDemond pipeline is not a scalable solution in general.
   
   Please correct me if I misunderstood the first point.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -130,11 +143,15 @@ public void process(ClusterEvent event) throws Exception {
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && (resource.getStateModelDefRef() == null
+                || !TaskConstants.STATE_MODEL_NAME.equals(resource.getStateModelDefRef()))) {
+              resourceToRebalance.put(resourceName, resource);

Review comment:
       1. I think taskResourcesToDrop is not necessary since we should be able to tell in the dispatch stage.
   2. It seems resourceToRebalance will be put anyway. So it seems to be (maybe) some redundant code.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       I see much duplicate code here. Could you please check if we can use the same code for the regular resource addition?

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -74,9 +79,8 @@ public void process(ClusterEvent event) throws Exception {
               cache.getResourceConfig(resourceName));
           resourceMap.put(resourceName, resource);
 
-          if (!idealState.isValid() && !isTaskCache
-              || idealState.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-              || !idealState.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && !isTaskCache) {
+          if (!isTaskCache && (!idealState.isValid() || !idealState.getStateModelDefRef()

Review comment:
       I'm very curious why we need to rebalance an invalid IS node?

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/task/TaskSchedulingStage.java
##########
@@ -110,117 +104,17 @@ private BestPossibleStateOutput compute(ClusterEvent event, Map<String, Resource
       restOfResources.remove(jobName);
     }
 
-    // Current rest of resources including: only current state left over ones
-    // Original resource map contains workflows + jobs + other invalid resources
-    // After removing workflows + jobs, only leftover ones will go over old rebalance pipeline.
-    for (Resource resource : restOfResources.values()) {
-      if (!computeResourceBestPossibleState(event, cache, currentStateOutput, resource, output)) {
-        failureResources.add(resource.getResourceName());
-        LogUtil.logWarn(logger, _eventId,
-            "Failed to calculate best possible states for " + resource.getResourceName());
-      }
+    Map<String, Resource> taskResourcesToDrop =
+        event.getAttribute(AttributeName.TASK_RESOURCES_TO_DROP.name());
+    for (String resourceName : taskResourcesToDrop.keySet()) {
+      ResourceAssignment emptyAssignment =
+          _workflowDispatcher.buildEmptyAssignment(resourceName, currentStateOutput);
+      _workflowDispatcher.updateBestPossibleStateOutput(resourceName, emptyAssignment, output);

Review comment:
       Can this logic be handled inside the workflow dispatcher just like the job dispatcher? The current code breaks workflow dispatcher OO design and lets the Stage class takes the responsibility of the dispatcher.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] narendly commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

narendly commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483236269



##########
File path: helix-core/src/main/java/org/apache/helix/task/TaskDriver.java
##########
@@ -762,9 +719,6 @@ public void deleteAndWaitForCompletion(String workflow, long timeout)
 
     // Deletion failed: check which step of deletion failed to complete and create an error message
     StringBuilder failed = new StringBuilder();
-    if (baseDataAccessor.exists(idealStatePath, AccessOption.PERSISTENT)) {
-      failed.append("IdealState ");
-    }

Review comment:
       Should we keep this for backward-compatibility? For legacy or existing workflows/jobs, how will their IdealStates be removed? Who will remove them?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on pull request #1326:
URL: https://github.com/apache/helix/pull/1326#issuecomment-692323898


   Final commit message:
   ## Remove IdealState Dependency from Task Framework ##
   This PR removes IdealState usage from Task Framework. The participant-side no longer creates IdealState when workflows/jobs are created. The controller-side no longer reads IdealState to create resources for Task Framework. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] jiajunwang commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

jiajunwang commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r486786354



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       How about "isTaskCache == TaskConstants.STATE_MODEL_NAME.equals(resource.getStateModelDefRef())"
   You can comment that we are trying to match for 2 conditions:
   1. If resource state model def is null, then we only proceed if it is a regular resource pipeline.
   2. If resource state model def is not null, then we only proceed if the pipeline type matches the resource state model type.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +102,15 @@ public void process(ClusterEvent event) throws Exception {
       }
     }

Review comment:
       Yeah, that's not too bad.
   I was thinking if the RESOURCES attribute is still necessary. But might be. For example, for the targeted jobs. So let's keep it for now.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       Then, can we add scheduleOnDemandPipeline into cleanupJobIdealStateExtView() ? Otherwise, cleanupJobIdealStateExtView is not a complete call that triggers the expected rebalance, right?
   It also helps to simplify code, IMO.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Not a must, but we did add a IS to ResourceConfig conversion method. It is ResourceConfig.mergeIdealStateWithResourceConfig(). So in theory, you can keep only this method by converting the IS into ResourceConfig, then use this method to add to the resource map.
   1. This helps to reduce duplicate code.
   2. ResourceConfig is the one we are going to use in the near future. IS will be transformed for controller output only. So that logic will be replaced anyway.
   
   Up to you if you want to do it now or in the future. If in the future, please add a TODO here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r485142052



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +101,43 @@ public void process(ClusterEvent event) throws Exception {
       }
     }
 
+    // Add TaskFramework resources from workflow and job configs as Task Framework will no longer
+    // use IdealState
+    if (isTaskCache) {

Review comment:
       @dasahcc I believe this has been resolved in our offline meeting - we still need to rely on resourcesMap at this moment as @alirezazamani said. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r478727494



##########
File path: helix-core/src/test/java/org/apache/helix/controller/stages/TestQuotaConstraintSkipWorkflowAssignment.java
##########
@@ -51,12 +51,11 @@ public void beforeClass() throws Exception {
   public void testQuotaConstraintSkipWorkflowAssignment() throws Exception {
     ClusterEvent event = new ClusterEvent(ClusterEventType.Unknown);
     WorkflowControllerDataProvider cache = new WorkflowControllerDataProvider(CLUSTER_NAME);
-    JobConfig.Builder job = new JobConfig.Builder();
-
-    job.setJobCommandConfigMap(Collections.singletonMap(MockTask.JOB_DELAY, "100000"));
     TaskDriver driver = new TaskDriver(_manager);
     for (int i = 0; i < 10; i++) {
       Workflow.Builder workflow = new Workflow.Builder("Workflow" + i);
+      JobConfig.Builder job = new JobConfig.Builder();

Review comment:
       Test bug fix: the old logic causes each job to accumulate the tasks assigned to the previous job. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r486561603



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +102,15 @@ public void process(ClusterEvent event) throws Exception {
       }
     }

Review comment:
       I wouldn't say it's interleaved, it's just how the logic should look. With what you proposed, it looks like:
   ```
   if (taskCache) {
     processIdealStates();
     processWorkflowConfigs();
     processJobConfigs();
   } else {
     processIdealStates();
   }
   backfillCurrentStates();
   ```
   If we take out the common factor, we have:
   ```
   processIdealStates();
   if (taskCache) {
     processWorkflowConfigs();
     processJobConfigs();
   }
   backfillCurrentStates();
   ```
   which is exactly what it is now. I think it's clear enough (especially given that we may remove the if block altogether later). 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] alirezazamani commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

alirezazamani commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r485073804



##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       I think this will be overkill as well and not necessary. Basically, I am looking the context as controller's output and reacting on output is not ideal from my prospective. Also, unlike regular resources, context changes are much more frequent in TF resources. So there will be many redundant pipeline runs and can cause other issue. In my opinion, we can stick to ondemand rebalancer for these corner cases.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] narendly commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

narendly commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483236666



##########
File path: helix-core/src/main/java/org/apache/helix/task/TaskUtil.java
##########
@@ -594,58 +594,6 @@ public static PropertyKey getWorkflowConfigKey(final HelixDataAccessor accessor,
     return accessor.keyBuilder().resourceConfig(workflow);
   }
 
-  /**
-   * Cleans up IdealState and external view associated with a job.
-   * @param accessor
-   * @param job
-   * @return True if remove success, otherwise false
-   */
-  protected static boolean cleanupJobIdealStateExtView(final HelixDataAccessor accessor,
-      String job) {
-    return cleanupIdealStateExtView(accessor, job);
-  }

Review comment:
       I wonder for backward-compatibility, if we should leave this in for the time being and just mark them as deprecated. For the existing workload?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483253349



##########
File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/RebalanceScheduler.java
##########
@@ -80,6 +80,7 @@ public void scheduleRebalance(HelixManager manager, String resource, long startT
     long delay = startTime - System.currentTimeMillis();
     if (delay < 0) {
       LOG.debug(String.format("Delay time is %s, will not be scheduled", delay));
+      return;

Review comment:
       The fix was done based on the expectation of this function, as seen in the comment: add a future rebalance task. In my opinion this function's duty is for future rebalance scheduling, therefore a start time in the past should be rejected as shown in the log. 
   
   If a user wants to schedule an immediate rebalance, we have `scheduleOnDemandRebalance`. If a user is calling this function incorrectly (passing in a past time), then the problem should show up during tests (by not triggering rebalance) instead of hiding it and let it slip to production; if a user is using this function correctly and wants to schedule a rebalance in the future at time=x, but somehow this function is delayed and is processed at time=x+1, then should that rebalance still happen? I don't think we should assume it's okay, therefore it should be rejected. 
   
   @jiajunwang Functionally I understand your point; code-quality-wise I think this function should do what it advertises. If there's a problem it should be caught by tests and the negative delay should be handled during development. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r484027477



##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       Your understanding is correct: that's why we need the on-demand trigger now without IS removal event. 
   
   About the second point: these are the only places IS removal events are necessary to trigger rebalance. It doesn't need to scale up. 
   
   @alirezazamani Could you chime in here on @jiajunwang 's proposal? I think context-based rebalance might be an overkill and introduce problems we are not ready to face. (Either way I don't think something like that should be tackled in this PR)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r478727137



##########
File path: helix-core/src/main/java/org/apache/helix/task/assigner/ThreadCountBasedTaskAssigner.java
##########
@@ -112,36 +111,18 @@
         continue;
       }
 
-      // TODO: Review this logic

Review comment:
       Fix #1324




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r487200442



##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase. I suppose we put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on pull request #1326:
URL: https://github.com/apache/helix/pull/1326#issuecomment-692289722


   > Code looks good to me.
   > 
   > But do you want to keep some test logic to verify that IS does NOT exist anymore even the jobs are running normally? Or do we already have a test case about it?
   
   After an offline discussion, we came to the conclusion that writing a test for feature removal is awkward and tricky to do right, and therefore it's not worth the effort at this moment. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r482249925



##########
File path: helix-core/src/main/java/org/apache/helix/task/WorkflowDispatcher.java
##########
@@ -290,7 +297,6 @@ private void scheduleJobs(String workflow, WorkflowConfig workflowCfg,
         // Time is not ready. Set a trigger and update the start time.
         // Check if the job is ready to be executed.
         if (System.currentTimeMillis() >= workflowCtx.getJobStartTime(job)) {
-          scheduleSingleJob(job, jobConfig);

Review comment:
       The only line that was kept in scheduleSingleJob() was the line about creating job context, however, that is already done in processJob. You could check my comment in scheduleSingleJob. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483271330



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -130,11 +170,15 @@ public void process(ClusterEvent event) throws Exception {
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && (resource.getStateModelDefRef() == null
+                || !TaskConstants.STATE_MODEL_NAME.equals(resource.getStateModelDefRef()))) {
+              resourceToRebalance.put(resourceName, resource);
+            }
+
+            if (isTaskCache && TaskConstants.STATE_MODEL_NAME
+                .equals(resource.getStateModelDefRef())) {
+              // If a task current state exists without configs, it needs to be cleaned up
+              taskResourcesToDrop.put(resourceName, resource);

Review comment:
       It will get dropped. This is not a newly introduced logic: in the old way before this PR, any job that doesn't have corresponding configs will be assigned DROPPED. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483275261



##########
File path: helix-core/src/main/java/org/apache/helix/task/TaskDriver.java
##########
@@ -762,9 +719,6 @@ public void deleteAndWaitForCompletion(String workflow, long timeout)
 
     // Deletion failed: check which step of deletion failed to complete and create an error message
     StringBuilder failed = new StringBuilder();
-    if (baseDataAccessor.exists(idealStatePath, AccessOption.PERSISTENT)) {
-      failed.append("IdealState ");
-    }

Review comment:
       As @alirezazamani has also mentioned, I suppose we should keep this behavior for now. I will add a comment noting that this should be removed later. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r480393666



##########
File path: helix-core/src/main/java/org/apache/helix/task/TaskDriver.java
##########
@@ -746,13 +705,11 @@ public void deleteAndWaitForCompletion(String workflow, long timeout)
     BaseDataAccessor baseDataAccessor = _accessor.getBaseDataAccessor();
     PropertyKey.Builder keyBuilder = _accessor.keyBuilder();
 
-    String idealStatePath = keyBuilder.idealStates(workflow).getPath();
     String workflowConfigPath = keyBuilder.resourceConfig(workflow).getPath();
     String workflowContextPath = keyBuilder.workflowContext(workflow).getPath();
 
     while (System.currentTimeMillis() <= endTime) {
-      if (baseDataAccessor.exists(idealStatePath, AccessOption.PERSISTENT)
-          || baseDataAccessor.exists(workflowConfigPath, AccessOption.PERSISTENT)
+      if (baseDataAccessor.exists(workflowConfigPath, AccessOption.PERSISTENT)

Review comment:
       So this section you commented on is only used for testing purpose, as far as I know. 
   
   However, you raised a good point here: 
   1. Workflows relying on the current pipeline logic should be fine during the switch, because they are guaranteed to have workflow configs.
   2. Workflows relying on the old pipeline logic (no config) wouldn't be managed by the controller after this change. Is that a concern?
   
   In general, keeping the IS clean up has only one advantage, which is to clean up the leftover IS from pre-existing workflows. However I don't think that is an huge issue. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483277045



##########
File path: helix-core/src/main/java/org/apache/helix/task/TaskUtil.java
##########
@@ -594,58 +594,6 @@ public static PropertyKey getWorkflowConfigKey(final HelixDataAccessor accessor,
     return accessor.keyBuilder().resourceConfig(workflow);
   }
 
-  /**
-   * Cleans up IdealState and external view associated with a job.
-   * @param accessor
-   * @param job
-   * @return True if remove success, otherwise false
-   */
-  protected static boolean cleanupJobIdealStateExtView(final HelixDataAccessor accessor,
-      String job) {
-    return cleanupIdealStateExtView(accessor, job);
-  }

Review comment:
       Ditto above, I'll leave them in. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r482599665



##########
File path: helix-core/src/main/java/org/apache/helix/task/TaskDriver.java
##########
@@ -746,13 +705,11 @@ public void deleteAndWaitForCompletion(String workflow, long timeout)
     BaseDataAccessor baseDataAccessor = _accessor.getBaseDataAccessor();
     PropertyKey.Builder keyBuilder = _accessor.keyBuilder();
 
-    String idealStatePath = keyBuilder.idealStates(workflow).getPath();
     String workflowConfigPath = keyBuilder.resourceConfig(workflow).getPath();
     String workflowContextPath = keyBuilder.workflowContext(workflow).getPath();
 
     while (System.currentTimeMillis() <= endTime) {
-      if (baseDataAccessor.exists(idealStatePath, AccessOption.PERSISTENT)
-          || baseDataAccessor.exists(workflowConfigPath, AccessOption.PERSISTENT)
+      if (baseDataAccessor.exists(workflowConfigPath, AccessOption.PERSISTENT)

Review comment:
       @alirezazamani Could you take another look at this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483841145



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/AttributeName.java
##########
@@ -22,6 +22,7 @@
 public enum AttributeName {
   RESOURCES,
   RESOURCES_TO_REBALANCE,
+  TASK_RESOURCES_TO_DROP,

Review comment:
       It's not about ideal state but rather configs. However with that said, I think you raised a good point; I can just check the cache. Let me explore in that direction. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] narendly commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

narendly commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483235161



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -130,11 +170,15 @@ public void process(ClusterEvent event) throws Exception {
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && (resource.getStateModelDefRef() == null
+                || !TaskConstants.STATE_MODEL_NAME.equals(resource.getStateModelDefRef()))) {
+              resourceToRebalance.put(resourceName, resource);
+            }
+
+            if (isTaskCache && TaskConstants.STATE_MODEL_NAME
+                .equals(resource.getStateModelDefRef())) {
+              // If a task current state exists without configs, it needs to be cleaned up
+              taskResourcesToDrop.put(resourceName, resource);

Review comment:
       What if the task's state is IN_PROGRESS or in a non-terminal state?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r481361556



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/task/TaskSchedulingStage.java
##########
@@ -110,117 +103,9 @@ private BestPossibleStateOutput compute(ClusterEvent event, Map<String, Resource
       restOfResources.remove(jobName);
     }
 
-    // Current rest of resources including: only current state left over ones
-    // Original resource map contains workflows + jobs + other invalid resources
-    // After removing workflows + jobs, only leftover ones will go over old rebalance pipeline.
-    for (Resource resource : restOfResources.values()) {

Review comment:
       Added new logic to drop task resources without configs. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483840746



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -74,9 +79,8 @@ public void process(ClusterEvent event) throws Exception {
               cache.getResourceConfig(resourceName));
           resourceMap.put(resourceName, resource);
 
-          if (!idealState.isValid() && !isTaskCache
-              || idealState.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-              || !idealState.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && !isTaskCache) {
+          if (!isTaskCache && (!idealState.isValid() || !idealState.getStateModelDefRef()

Review comment:
       I'll add a comment. I don't have an answer to that question unfortunately; for the scope of this PR I'll leave it untouched. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] narendly commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

narendly commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483232544



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +101,43 @@ public void process(ClusterEvent event) throws Exception {
       }
     }
 
+    // Add TaskFramework resources from workflow and job configs as Task Framework will no longer
+    // use IdealState
+    if (isTaskCache) {
+      WorkflowControllerDataProvider taskDataCache =
+          event.getAttribute(AttributeName.ControllerDataProvider.name());
+      for (Map.Entry<String, WorkflowConfig> workflowConfigEntry : taskDataCache
+          .getWorkflowConfigMap().entrySet()) {
+        // always overwrite, because the resource could be created by IS
+        String resourceName = workflowConfigEntry.getKey();
+        WorkflowConfig workflowConfig = workflowConfigEntry.getValue();
+        addResourceConfigToResourceMap(resourceName, workflowConfig, cache.getClusterConfig(),
+            resourceMap, resourceToRebalance);
+        addPartition(resourceName, resourceName, resourceMap);
+      }
+
+      for (Map.Entry<String, JobConfig> jobConfigEntry : taskDataCache.getJobConfigMap()
+          .entrySet()) {
+        // always overwrite, because the resource could be created by IS

Review comment:
       Nit: the resource could "have been created" based on IS




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483273500



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +232,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,

Review comment:
       This PR doesn't change any of the map populating behavior: if a workflow/job was added to both maps in the past (based on IS), now they are added to both maps based on configs. - [With that said, your assessment on `resourceToRebalance` isn't entirely accurate: when the pipeline is for tasks, `resourceToRebalance` contains all the task resources as well.]
   
   The reason why we have both maps is because of the usages in other stages. For example, in `CurrentStateComputationStage`, task resources need to be present in both maps. There's definitely room for improvement in this aspect, but that would perhaps be in another PR. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] alirezazamani commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

alirezazamani commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r480294531



##########
File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/RebalanceScheduler.java
##########
@@ -80,6 +80,7 @@ public void scheduleRebalance(HelixManager manager, String resource, long startT
     long delay = startTime - System.currentTimeMillis();
     if (delay < 0) {
       LOG.debug(String.format("Delay time is %s, will not be scheduled", delay));
+      return;

Review comment:
       Good catch.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r484028447



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +101,43 @@ public void process(ClusterEvent event) throws Exception {
       }
     }
 
+    // Add TaskFramework resources from workflow and job configs as Task Framework will no longer
+    // use IdealState
+    if (isTaskCache) {
+      WorkflowControllerDataProvider taskDataCache =
+          event.getAttribute(AttributeName.ControllerDataProvider.name());
+      for (Map.Entry<String, WorkflowConfig> workflowConfigEntry : taskDataCache
+          .getWorkflowConfigMap().entrySet()) {
+        // always overwrite, because the resource could be created by IS
+        String resourceName = workflowConfigEntry.getKey();
+        WorkflowConfig workflowConfig = workflowConfigEntry.getValue();
+        addResourceConfigToResourceMap(resourceName, workflowConfig, cache.getClusterConfig(),
+            resourceMap, resourceToRebalance);
+        addPartition(resourceName, resourceName, resourceMap);

Review comment:
       This is mimicking the behavior of IdealState-based resource computation: when Workflow configs are created, their MapFields contains their own workflow names (1 field), which will be added as a partition. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] narendly commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

narendly commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483237466



##########
File path: helix-core/src/main/java/org/apache/helix/task/WorkflowDispatcher.java
##########
@@ -322,64 +328,6 @@ private void processJob(String job, CurrentStateOutput currentStateOutput,
     }
   }
 
-  /**
-   * Posts new job to cluster
-   */
-  private void scheduleSingleJob(String jobResource, JobConfig jobConfig) {
-    HelixAdmin admin = _manager.getClusterManagmentTool();
-
-    IdealState jobIS = admin.getResourceIdealState(_manager.getClusterName(), jobResource);
-    if (jobIS != null) {
-      LOG.info("Job " + jobResource + " idealstate already exists!");
-      return;
-    }
-
-    // Set up job resource based on partitions from target resource
-
-    // Create the UserContentStore for the job first
-    TaskUtil.createUserContent(_manager.getHelixPropertyStore(), jobResource,

Review comment:
       Agreed - we've been meaning to do this. We have an issue created for this work item I believe?

##########
File path: helix-core/src/main/java/org/apache/helix/task/WorkflowDispatcher.java
##########
@@ -322,64 +328,6 @@ private void processJob(String job, CurrentStateOutput currentStateOutput,
     }
   }
 
-  /**
-   * Posts new job to cluster
-   */
-  private void scheduleSingleJob(String jobResource, JobConfig jobConfig) {

Review comment:
       Yay! :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r487197712



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       That is interesting. However, the next step of Task Framework will lead to the removal of this section of ResourceComputationStage - it no longer needs to create Resources from WorkflowConfigs/JobConfigs. (Check here https://github.com/apache/helix/wiki/Task-Framework-IdealState-Dependency-Removal-Progression) 
   
   I will put a TODO here in case the next step doesn't happen. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase. I suppose we put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase (it could trigger rebalance if the participant side is not updated, and it wouldn't trigger rebalance if the participant side is updated). I suppose we put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       That's very smart - updating. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase (it could trigger rebalance if the participant side is not updated, and it wouldn't trigger rebalance if the participant side is updated). I suppose we can put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] jiajunwang commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

jiajunwang commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r486786354



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       How about "isTaskCache == TaskConstants.STATE_MODEL_NAME.equals(resource.getStateModelDefRef())"
   You can comment that we are trying to match for 2 conditions:
   1. If resource state model def is null, then we only proceed if it is a regular resource pipeline.
   2. If resource state model def is not null, then we only proceed if the pipeline type matches the resource state model type.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +102,15 @@ public void process(ClusterEvent event) throws Exception {
       }
     }

Review comment:
       Yeah, that's not too bad.
   I was thinking if the RESOURCES attribute is still necessary. But might be. For example, for the targeted jobs. So let's keep it for now.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       Then, can we add scheduleOnDemandPipeline into cleanupJobIdealStateExtView() ? Otherwise, cleanupJobIdealStateExtView is not a complete call that triggers the expected rebalance, right?
   It also helps to simplify code, IMO.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Not a must, but we did add a IS to ResourceConfig conversion method. It is ResourceConfig.mergeIdealStateWithResourceConfig(). So in theory, you can keep only this method by converting the IS into ResourceConfig, then use this method to add to the resource map.
   1. This helps to reduce duplicate code.
   2. ResourceConfig is the one we are going to use in the near future. IS will be transformed for controller output only. So that logic will be replaced anyway.
   
   Up to you if you want to do it now or in the future. If in the future, please add a TODO here.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Sure, that also makes sense.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       This is necessary. So no matter who does the cleanup later, he or she won't forget to remove the scheduleOnDemandPipeline(). High-level, these 2 logics are bundled. And we shall not assume everyone remembers this relationship by their memory or document. Organize the code to ensure it.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       As we discussed, please add a TODO there for the future change. For now, it is OK to be left here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] alirezazamani merged pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

alirezazamani merged pull request #1326:
URL: https://github.com/apache/helix/pull/1326


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] jiajunwang commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

jiajunwang commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r487238608



##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       This is necessary. So no matter who does the cleanup later, he or she won't forget to remove the scheduleOnDemandPipeline(). High-level, these 2 logics are bundled. And we shall not assume everyone remembers this relationship by their memory or document. Organize the code to ensure it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] dasahcc commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

dasahcc commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r481358790



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +101,43 @@ public void process(ClusterEvent event) throws Exception {
       }
     }
 
+    // Add TaskFramework resources from workflow and job configs as Task Framework will no longer
+    // use IdealState
+    if (isTaskCache) {

Review comment:
       Do we need this? We already based on all Workflows to do the assignment. Originally have this is for backward compatible change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] narendly commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

narendly commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483225266



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -59,6 +62,7 @@ public void process(ClusterEvent event) throws Exception {
 
     Map<String, Resource> resourceMap = new LinkedHashMap<>();
     Map<String, Resource> resourceToRebalance = new LinkedHashMap<>();
+    Map<String, Resource> taskResourcesToDrop = new LinkedHashMap<>();

Review comment:
       Does this need to be a LinkedHashMap?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483841145



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/AttributeName.java
##########
@@ -22,6 +22,7 @@
 public enum AttributeName {
   RESOURCES,
   RESOURCES_TO_REBALANCE,
+  TASK_RESOURCES_TO_DROP,

Review comment:
       It's not about ideal state but rather configs. However with that said, I think you raised a good point; I can just check the cache. Let me explore in that direction. Thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483843385



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +102,15 @@ public void process(ClusterEvent event) throws Exception {
       }
     }

Review comment:
       One problem is that the task pipeline still needs the regular resource objects due to targeted jobs. Therefore the task pipeline resource-computation needs everything from resource pipeline resource-computation; it makes sense for task pipeline to be "resource pipeline resource computation + reading configs", which is the current logic. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483273500



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +232,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,

Review comment:
       This PR doesn't change any of the map populating behavior: if a workflow/job was added to both maps in the past (based on IS), now they are added to both maps based on configs. <s>With that said, your assessment on `resourceToRebalance` isn't entirely accurate: when the pipeline is for tasks, `resourceToRebalance` contains all the task resources as well.</s> Edit: the original comment is referring to the resource pipeline and is correct. 
   
   The reason why we have both maps is because of the usages in other stages. For example, in `CurrentStateComputationStage`, task resources need to be present in both maps. There's definitely room for improvement in this aspect, but that would perhaps be in another PR. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483273500



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +232,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,

Review comment:
       This PR doesn't change any of the map populating behavior: if a workflow/job was added to both maps in the past (based on IS), now they are added to both maps based on configs. <s>With that said, your assessment on `resourceToRebalance` isn't entirely accurate: when the pipeline is for tasks, `resourceToRebalance` contains all the task resources as well.<s>
   
   The reason why we have both maps is because of the usages in other stages. For example, in `CurrentStateComputationStage`, task resources need to be present in both maps. There's definitely room for improvement in this aspect, but that would perhaps be in another PR. 

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +232,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,

Review comment:
       This PR doesn't change any of the map populating behavior: if a workflow/job was added to both maps in the past (based on IS), now they are added to both maps based on configs. <s>With that said, your assessment on `resourceToRebalance` isn't entirely accurate: when the pipeline is for tasks, `resourceToRebalance` contains all the task resources as well.</s>
   
   The reason why we have both maps is because of the usages in other stages. For example, in `CurrentStateComputationStage`, task resources need to be present in both maps. There's definitely room for improvement in this aspect, but that would perhaps be in another PR. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 edited a comment on pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 edited a comment on pull request #1326:
URL: https://github.com/apache/helix/pull/1326#issuecomment-692323898


   This PR is ready to be merged, approved by @jiajunwang 
   Final commit message:
   ## Remove IdealState Dependency from Task Framework ##
   This PR removes IdealState usage from Task Framework. The participant-side no longer creates IdealState when workflows/jobs are created. The controller-side no longer reads IdealState to create resources for Task Framework. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r487197712



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       That is interesting. However, the next step of Task Framework will lead to the removal of this section of ResourceComputationStage - it no longer needs to create Resources from WorkflowConfigs/JobConfigs. (Check here https://github.com/apache/helix/wiki/Task-Framework-IdealState-Dependency-Removal-Progression) 
   
   I will put a TODO here in case the next step doesn't happen. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase. I suppose we put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase (it could trigger rebalance if the participant side is not updated, and it wouldn't trigger rebalance if the participant side is updated). I suppose we put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       That's very smart - updating. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase (it could trigger rebalance if the participant side is not updated, and it wouldn't trigger rebalance if the participant side is updated). I suppose we can put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r484025889



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -130,11 +143,15 @@ public void process(ClusterEvent event) throws Exception {
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && (resource.getStateModelDefRef() == null
+                || !TaskConstants.STATE_MODEL_NAME.equals(resource.getStateModelDefRef()))) {
+              resourceToRebalance.put(resourceName, resource);

Review comment:
       1. It has been removed, thanks for the suggestion. 
   2. We still need an if statement: if not task pipeline and state model name is not TASK, or, if is task pipeline and state model name is TASK. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] alirezazamani commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

alirezazamani commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483226238



##########
File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/RebalanceScheduler.java
##########
@@ -80,6 +80,7 @@ public void scheduleRebalance(HelixManager manager, String resource, long startT
     long delay = startTime - System.currentTimeMillis();
     if (delay < 0) {
       LOG.debug(String.format("Delay time is %s, will not be scheduled", delay));
+      return;

Review comment:
       In my opinion, if delay is less than 0, we should not schedule rebalance. I will let Neal to convince you though. But anyways it is not the concern of this PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483849489



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Unfortunately, with the way IdealState and CurrentState are implemented, although they share similar methods and the same parent class, they cannot be generalized as the similar methods are implemented in the sub classes individually. The best I can do based on my knowledge is to refactor. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] jiajunwang commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

jiajunwang commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r486545228



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +102,15 @@ public void process(ClusterEvent event) throws Exception {
       }
     }

Review comment:
       Then let's re-org the logic a little bit. So they are not interleaved.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] jiajunwang commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

jiajunwang commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r487236895



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       Sure, that also makes sense.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] jiajunwang commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

jiajunwang commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483221807



##########
File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/RebalanceScheduler.java
##########
@@ -80,6 +80,7 @@ public void scheduleRebalance(HelixManager manager, String resource, long startT
     long delay = startTime - System.currentTimeMillis();
     if (delay < 0) {
       LOG.debug(String.format("Delay time is %s, will not be scheduled", delay));
+      return;

Review comment:
       I'm not convinced this should not be scheduled. I believe the expected behavior is triggering the rebalance immediately. In this case, we shall fix the log not the logic.
   
   Overall, if the caller wants to schedule a rebalance, then it should be done. It is the caller responsibility to decide whether it is necessary. The scheduler should ensure it happens. Otherwise, we may miss the event by not doing anything. That will be a real bug.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r478726856



##########
File path: helix-core/src/main/java/org/apache/helix/task/WorkflowDispatcher.java
##########
@@ -322,64 +328,6 @@ private void processJob(String job, CurrentStateOutput currentStateOutput,
     }
   }
 
-  /**
-   * Posts new job to cluster
-   */
-  private void scheduleSingleJob(String jobResource, JobConfig jobConfig) {
-    HelixAdmin admin = _manager.getClusterManagmentTool();
-
-    IdealState jobIS = admin.getResourceIdealState(_manager.getClusterName(), jobResource);
-    if (jobIS != null) {
-      LOG.info("Job " + jobResource + " idealstate already exists!");
-      return;
-    }
-
-    // Set up job resource based on partitions from target resource
-
-    // Create the UserContentStore for the job first
-    TaskUtil.createUserContent(_manager.getHelixPropertyStore(), jobResource,

Review comment:
       This line is already done in processJob() when the job is first started, making this logic redundant. Therefore it's removed. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r478728537



##########
File path: helix-core/src/test/java/org/apache/helix/task/TestGetLastScheduledTaskExecInfo.java
##########
@@ -75,6 +75,10 @@ public void testGetLastScheduledTaskExecInfo() throws Exception {
 
     // Stop and delete the queue
     _driver.stop(queueName);
+    TestHelper.verify(() -> {

Review comment:
       Test bug fix: if the job queue is deleted right after stopping, it is possible that current states are not dropped on the instances. This is because the RUNNING to STOPPED message may still be pending when STOPPED to DROPPED message is generated, meaning the latter message is not sent; however, during the next pipeline cycle the job queue may have been deleted already, so the STOPPED to DROPPED message is never generated again. Adding this check to make the test non-flaky. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r487197712



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       That is interesting. However, the next step of Task Framework will lead to the removal of this section of ResourceComputationStage - it no longer needs to create Resources from WorkflowConfigs/JobConfigs. (Check here https://github.com/apache/helix/wiki/Task-Framework-IdealState-Dependency-Removal-Progression) 
   
   I will put a TODO here in case the next step doesn't happen. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase. I suppose we put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase (it could trigger rebalance if the participant side is not updated, and it wouldn't trigger rebalance if the participant side is updated). I suppose we put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       That's very smart - updating. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase (it could trigger rebalance if the participant side is not updated, and it wouldn't trigger rebalance if the participant side is updated). I suppose we can put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       That is interesting. However, the next step of Task Framework will lead to the removal of this section of ResourceComputationStage - it no longer needs to create Resources from WorkflowConfigs/JobConfigs. (Check here https://github.com/apache/helix/wiki/Task-Framework-IdealState-Dependency-Removal-Progression) 
   
   I will put a TODO here in case the next step doesn't happen. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase. I suppose we put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase (it could trigger rebalance if the participant side is not updated, and it wouldn't trigger rebalance if the participant side is updated). I suppose we put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       That's very smart - updating. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase (it could trigger rebalance if the participant side is not updated, and it wouldn't trigger rebalance if the participant side is updated). I suppose we can put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       That is interesting. However, the next step of Task Framework will lead to the removal of this section of ResourceComputationStage - it no longer needs to create Resources from WorkflowConfigs/JobConfigs. (Check here https://github.com/apache/helix/wiki/Task-Framework-IdealState-Dependency-Removal-Progression) 
   
   I will put a TODO here in case the next step doesn't happen. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase. I suppose we put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase (it could trigger rebalance if the participant side is not updated, and it wouldn't trigger rebalance if the participant side is updated). I suppose we put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       That's very smart - updating. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase (it could trigger rebalance if the participant side is not updated, and it wouldn't trigger rebalance if the participant side is updated). I suppose we can put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       That is interesting. However, the next step of Task Framework will lead to the removal of this section of ResourceComputationStage - it no longer needs to create Resources from WorkflowConfigs/JobConfigs. (Check here https://github.com/apache/helix/wiki/Task-Framework-IdealState-Dependency-Removal-Progression) 
   
   I will put a TODO here in case the next step doesn't happen. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase. I suppose we put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase (it could trigger rebalance if the participant side is not updated, and it wouldn't trigger rebalance if the participant side is updated). I suppose we put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       That's very smart - updating. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase (it could trigger rebalance if the participant side is not updated, and it wouldn't trigger rebalance if the participant side is updated). I suppose we can put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       That is interesting. However, the next step of Task Framework will lead to the removal of this section of ResourceComputationStage - it no longer needs to create Resources from WorkflowConfigs/JobConfigs. (Check here https://github.com/apache/helix/wiki/Task-Framework-IdealState-Dependency-Removal-Progression) 
   
   I will put a TODO here in case the next step doesn't happen. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase. I suppose we put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase (it could trigger rebalance if the participant side is not updated, and it wouldn't trigger rebalance if the participant side is updated). I suppose we put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       That's very smart - updating. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase (it could trigger rebalance if the participant side is not updated, and it wouldn't trigger rebalance if the participant side is updated). I suppose we can put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       That is interesting. However, the next step of Task Framework will lead to the removal of this section of ResourceComputationStage - it no longer needs to create Resources from WorkflowConfigs/JobConfigs. (Check here https://github.com/apache/helix/wiki/Task-Framework-IdealState-Dependency-Removal-Progression) 
   
   I will put a TODO here in case the next step doesn't happen. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase. I suppose we put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase (it could trigger rebalance if the participant side is not updated, and it wouldn't trigger rebalance if the participant side is updated). I suppose we put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       That's very smart - updating. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase (it could trigger rebalance if the participant side is not updated, and it wouldn't trigger rebalance if the participant side is updated). I suppose we can put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +250,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,
+      ClusterConfig clusterConfig, Map<String, Resource> resourceMap,
+      Map<String, Resource> resourceToRebalance) {
+    Resource resource = new Resource(resourceName, clusterConfig, resourceConfig);
+    resourceMap.put(resourceName, resource);

Review comment:
       That is interesting. However, the next step of Task Framework will lead to the removal of this section of ResourceComputationStage - it no longer needs to create Resources from WorkflowConfigs/JobConfigs. (Check here https://github.com/apache/helix/wiki/Task-Framework-IdealState-Dependency-Removal-Progression) 
   
   I will put a TODO here in case the next step doesn't happen. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase. I suppose we put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase (it could trigger rebalance if the participant side is not updated, and it wouldn't trigger rebalance if the participant side is updated). I suppose we put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       That's very smart - updating. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase (it could trigger rebalance if the participant side is not updated, and it wouldn't trigger rebalance if the participant side is updated). I suppose we can put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r480387061



##########
File path: helix-core/src/main/java/org/apache/helix/task/JobDispatcher.java
##########
@@ -85,15 +86,15 @@ public ResourceAssignment processJobStatusUpdateAndAssignment(String jobName,
     // completed)
     TaskState workflowState = workflowCtx.getWorkflowState();
     TaskState jobState = workflowCtx.getJobState(jobName);
-    // The job is already in a final state (completed/failed).
+    // Do not include workflowState == TIMED_OUT here, as later logic needs to handle this case

Review comment:
       No it's not related to this PR. I added it as I realized why we shouldn't have TIMED_OUT here, and other devs should know too. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483839440



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +232,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,

Review comment:
       I see that I had a bit misunderstanding on your original message: you're right that in the resource pipeline, `resourceToRebalance` doesn't contain task resources; in the task pipeline, `resourceToRebalance` only contains task resources. This behavior remains the same after IS removal (`resourceMap` contains all resources).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] narendly commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

narendly commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483226992



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +101,43 @@ public void process(ClusterEvent event) throws Exception {
       }
     }
 
+    // Add TaskFramework resources from workflow and job configs as Task Framework will no longer
+    // use IdealState
+    if (isTaskCache) {
+      WorkflowControllerDataProvider taskDataCache =
+          event.getAttribute(AttributeName.ControllerDataProvider.name());
+      for (Map.Entry<String, WorkflowConfig> workflowConfigEntry : taskDataCache
+          .getWorkflowConfigMap().entrySet()) {
+        // always overwrite, because the resource could be created by IS
+        String resourceName = workflowConfigEntry.getKey();
+        WorkflowConfig workflowConfig = workflowConfigEntry.getValue();
+        addResourceConfigToResourceMap(resourceName, workflowConfig, cache.getClusterConfig(),
+            resourceMap, resourceToRebalance);
+        addPartition(resourceName, resourceName, resourceMap);

Review comment:
       What does this do? `addPartition(resourceName, resourceName, resourceMap);`
   
   - Why are we doing an addPartition for a workflow?
   - Why are we putting resourceName twice?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r487200442



##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       `cleanupJobIdealStateExtView()` is kept for backward-compatibility purpose and is marked as deprecated now; it will be removed on a later phase (it could trigger rebalance if the participant side is not updated, and it wouldn't trigger rebalance if the participant side is updated). I suppose we put the on-demand rebalance in `cleanupJobIdealStateExtView()` now, and move it out when we delete `cleanupJobIdealStateExtView()`. Do you think that's necessary @jiajunwang ? Seems not to be too much difference. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] narendly commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

narendly commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483231899



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +232,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,

Review comment:
       So it seems that all `WorkflowConfigs` and `JobConfigs` are being added to both `resourceMap` and `resourceToRebalance`. 
   
   If my understanding of the pipeline is up-to-date, the reason we have two different maps is for the resource pipeline to filter out TASK resources and save them into `resourceToRebalance`.
   So 
   - `resourceMap`: all resources 
   - `resourceToRebalance`: all resources - TASK resources
   
   Now, for the task pipeline, do we need to keep both maps? It seems that you're doing the filtering by populating things from `ResourceConfigs` only + `currentStates` to `resourceToRebalance`. With that said, what's the point of populating/keeping `resourceMap`? Does `resourceMap` still contain ALL resources for the task pipeline as well?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] narendly commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

narendly commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483275335



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -185,4 +232,21 @@ private void addPartition(String partition, String resourceName, Map<String, Res
     resource.addPartition(partition);
 
   }
+
+  private void addResourceConfigToResourceMap(String resourceName, ResourceConfig resourceConfig,

Review comment:
       > With that said, your assessment on `resourceToRebalance` isn't entirely accurate: when the pipeline is for tasks, `resourceToRebalance` contains all the task resources as well.
   
   What's not accurate? Do you mean to say that for the resource pipeline, resourceToRebalance contains task resources? For the task pipeline, should resourceMap contain all resources and resourceToRebalance contain only task resources?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] narendly commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

narendly commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483227280



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +101,43 @@ public void process(ClusterEvent event) throws Exception {
       }
     }
 
+    // Add TaskFramework resources from workflow and job configs as Task Framework will no longer
+    // use IdealState
+    if (isTaskCache) {
+      WorkflowControllerDataProvider taskDataCache =
+          event.getAttribute(AttributeName.ControllerDataProvider.name());
+      for (Map.Entry<String, WorkflowConfig> workflowConfigEntry : taskDataCache
+          .getWorkflowConfigMap().entrySet()) {
+        // always overwrite, because the resource could be created by IS
+        String resourceName = workflowConfigEntry.getKey();
+        WorkflowConfig workflowConfig = workflowConfigEntry.getValue();
+        addResourceConfigToResourceMap(resourceName, workflowConfig, cache.getClusterConfig(),
+            resourceMap, resourceToRebalance);
+        addPartition(resourceName, resourceName, resourceMap);
+      }
+
+      for (Map.Entry<String, JobConfig> jobConfigEntry : taskDataCache.getJobConfigMap()
+          .entrySet()) {
+        // always overwrite, because the resource could be created by IS
+        String resourceName = jobConfigEntry.getKey();
+        JobConfig jobConfig = jobConfigEntry.getValue();
+        addResourceConfigToResourceMap(resourceName, jobConfig, cache.getClusterConfig(),
+            resourceMap, resourceToRebalance);
+        int numPartitions = jobConfig.getTaskConfigMap().size();
+        if (numPartitions == 0 && idealStates != null) {
+          IdealState targetIs = idealStates.get(jobConfig.getTargetResource());
+          if (targetIs == null) {
+            LOG.warn("Target resource does not exist for job " + resourceName);
+          } else {
+            numPartitions = targetIs.getPartitionSet().size();
+          }
+        }
+        for (int i = 0; i < numPartitions; i++) {
+          addPartition(resourceName + "_" + i, resourceName, resourceMap);
+        }
+      }
+    }

Review comment:
       Could this whole block be further refactored into a private method for readability? It would be helpful to add a JavaDoc as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] narendly commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

narendly commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r483234401



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -98,6 +101,43 @@ public void process(ClusterEvent event) throws Exception {
       }
     }
 
+    // Add TaskFramework resources from workflow and job configs as Task Framework will no longer
+    // use IdealState
+    if (isTaskCache) {
+      WorkflowControllerDataProvider taskDataCache =
+          event.getAttribute(AttributeName.ControllerDataProvider.name());
+      for (Map.Entry<String, WorkflowConfig> workflowConfigEntry : taskDataCache
+          .getWorkflowConfigMap().entrySet()) {
+        // always overwrite, because the resource could be created by IS
+        String resourceName = workflowConfigEntry.getKey();
+        WorkflowConfig workflowConfig = workflowConfigEntry.getValue();
+        addResourceConfigToResourceMap(resourceName, workflowConfig, cache.getClusterConfig(),
+            resourceMap, resourceToRebalance);
+        addPartition(resourceName, resourceName, resourceMap);
+      }
+
+      for (Map.Entry<String, JobConfig> jobConfigEntry : taskDataCache.getJobConfigMap()
+          .entrySet()) {
+        // always overwrite, because the resource could be created by IS
+        String resourceName = jobConfigEntry.getKey();
+        JobConfig jobConfig = jobConfigEntry.getValue();
+        addResourceConfigToResourceMap(resourceName, jobConfig, cache.getClusterConfig(),
+            resourceMap, resourceToRebalance);
+        int numPartitions = jobConfig.getTaskConfigMap().size();
+        if (numPartitions == 0 && idealStates != null) {
+          IdealState targetIs = idealStates.get(jobConfig.getTargetResource());
+          if (targetIs == null) {
+            LOG.warn("Target resource does not exist for job " + resourceName);
+          } else {
+            numPartitions = targetIs.getPartitionSet().size();
+          }
+        }

Review comment:
       Nit: could we add some comments here to make it clear that we're doing this for targeted jobs? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] NealSun96 commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

NealSun96 commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r487202332



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java
##########
@@ -124,17 +205,15 @@ public void process(ClusterEvent event) throws Exception {
 
           // don't overwrite ideal state settings
           if (!resourceMap.containsKey(resourceName)) {
-            addResource(resourceName, resourceMap);
-            Resource resource = resourceMap.get(resourceName);
+            Resource resource = new Resource(resourceName);
             resource.setStateModelDefRef(currentState.getStateModelDefRef());
             resource.setStateModelFactoryName(currentState.getStateModelFactoryName());
             resource.setBucketSize(currentState.getBucketSize());
             resource.setBatchMessageMode(currentState.getBatchMessageMode());
-            if (resource.getStateModelDefRef() == null && !isTaskCache
-                || resource.getStateModelDefRef() != null && (
-                resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME) && isTaskCache
-                    || !resource.getStateModelDefRef().equals(TaskConstants.STATE_MODEL_NAME)
-                    && !isTaskCache)) {
+            if (!isTaskCache && !TaskConstants.STATE_MODEL_NAME

Review comment:
       That's very smart - updating. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] alirezazamani commented on a change in pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

alirezazamani commented on a change in pull request #1326:
URL: https://github.com/apache/helix/pull/1326#discussion_r485073804



##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -519,6 +520,8 @@ protected void handleJobTimeout(JobContext jobCtx, WorkflowContext workflowCtx,
     _clusterStatusMonitor.updateJobCounters(jobCfg, TaskState.TIMED_OUT);
     _rebalanceScheduler.removeScheduledRebalance(jobResource);
     TaskUtil.cleanupJobIdealStateExtView(_manager.getHelixDataAccessor(), jobResource);
+    // New pipeline trigger for workflow status update
+    RebalanceUtil.scheduleOnDemandPipeline(_manager.getClusterName(),0L,false);

Review comment:
       I think this will be overkill as well and not necessary. Basically, I am looking the context as controller's output and reacting on output is not ideal from my prospective. Also, unlike regular resources, context/output changes are much more frequent in TF resources. So there will be many redundant pipeline runs and can cause other issue. In my opinion, we can stick to ondemand rebalancer for these corner cases.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org

[GitHub] [helix] alirezazamani commented on pull request #1326: Task Framework IdealState Removal

Posted by GitBox <gi...@apache.org>.

alirezazamani commented on pull request #1326:
URL: https://github.com/apache/helix/pull/1326#issuecomment-685895016


   @NealSun96 Thanks for your PR. I think the PR is in good shape now. Can you please add an integration test for resource to drop? That can conclude this PR. In the meanwhile, I will review the code again. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org