You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@helix.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2018/07/16 19:12:00 UTC

[jira] [Commented] (HELIX-730) [TASK] Add ThreadCountBasedAssignmentCalculator and integrate with Workflow/JobRebalancer and fix rebalancing logic

    [ https://issues.apache.org/jira/browse/HELIX-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545636#comment-16545636 ] 

Hudson commented on HELIX-730:
------------------------------

FAILURE: Integrated in Jenkins build helix #1512 (See [https://builds.apache.org/job/helix/1512/])
[HELIX-730] Add ThreadCountBasedAssignmentCalculator and integrate with (narendly: rev 4db61b56e473b64ec9956f694dd2ac6a8d328ed4)
* (edit) helix-core/src/main/java/org/apache/helix/task/TaskRebalancer.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestWorkflowTimeout.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestJobTimeoutTaskNotStarted.java
* (add) helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java
* (add) helix-core/src/test/java/org/apache/helix/integration/task/TestTaskAssignmentCalculator.java
* (edit) helix-core/src/main/java/org/apache/helix/task/JobRebalancer.java
* (edit) helix-core/src/main/java/org/apache/helix/task/JobConfig.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestJobFailureTaskNotStarted.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancer.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerRetryLimit.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestWorkflowJobDependency.java
* (edit) helix-core/src/test/java/org/apache/helix/task/TestSemiAutoStateTransition.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestJobFailure.java
* (edit) helix-core/src/main/java/org/apache/helix/task/assigner/ThreadCountBasedTaskAssigner.java
* (edit) helix-core/src/main/java/org/apache/helix/task/FixedTargetTaskAssignmentCalculator.java
* (edit) helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/controller/TestTargetExternalView.java
* (edit) helix-core/src/main/java/org/apache/helix/task/WorkflowRebalancer.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestIndependentTaskRebalancer.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestWorkflowTermination.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/TestBatchEnableInstances.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestJobTimeout.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestJobFailureHighThreshold.java
* (edit) helix-core/src/main/java/org/apache/helix/task/assigner/AssignableInstance.java
* (edit) helix-core/src/main/java/org/apache/helix/controller/stages/ClusterDataCache.java
* (edit) helix-core/src/main/java/org/apache/helix/task/TaskAssignmentCalculator.java
* (edit) helix-core/src/main/java/org/apache/helix/model/ClusterConfig.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestDeleteWorkflow.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestRebalanceRunningTask.java
* (edit) helix-core/src/main/java/org/apache/helix/task/AssignableInstanceManager.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/TestStateTransitionCancellation.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/controller/TestClusterMaintenanceMode.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRetryDelay.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/manager/TestZkHelixAdmin.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestStopWorkflow.java
* (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerFailover.java
* (add) helix-core/src/main/java/org/apache/helix/task/ThreadCountBasedTaskAssignmentCalculator.java
* (edit) helix-core/src/main/java/org/apache/helix/task/TaskUtil.java
* (delete) helix-core/src/test/java/org/apache/helix/integration/task/TestGenericTaskAssignmentCalculator.java
* (edit) helix-core/src/test/java/org/apache/helix/task/TaskSynchronizedTestBase.java


> [TASK] Add ThreadCountBasedAssignmentCalculator and integrate with Workflow/JobRebalancer and fix rebalancing logic
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HELIX-730
>                 URL: https://issues.apache.org/jira/browse/HELIX-730
>             Project: Apache Helix
>          Issue Type: Improvement
>            Reporter: Hunter L
>            Priority: Major
>
> For quota-based scheduling of tasks, we have added the TaskAssigner interface that takes into account AssignableInstances by way of AssignableInstanceManager. In order to use this in the currently-existing pipeline prior to Task Framework 2.0, GenericTaskAssignmentCalculator was replaced with ThreadCountBasedAssignmentCalculator, which is a wrapper around TaskAssigner. Necessary adjustments were made in Workflow/JobRebalancer for this replacement. Also the rebalance logic in Workflow/JobRebalancer was reviewed and fixed. Additionally, TestQuotaBasedScheduling is added to test quota-based task scheduling. Note that quotas will apply to both generic and targeted jobs.
> A few bugs were uncovered during this process such as the faulty retry logic that never really got tasks to restart. For more details, see the changelist below:
> Changelist:
>     1. Add ThreadCountBasedAssignmentCalculator, a wrapper around ThreadCountBasedTaskAssigner
>     2. Make logic changes in JobRebalancer to enable the use of ThreadCountBasedAssignmentCalculator
>     3. Fix the failing test by using a thread-safe map and rename TestGenericTaskAssignmentCalculator to TestTaskAssignmentCalculator to better reflect what its tests are doing
>     4. Add retry logic that was previously absent for INIT and DROPPED tasks in JobRebalancer
>     5. Add TestQuotaBasedScheduling to test that jobs and tasks were being assigned and scheduled per quota config set in ClusterConfig
>     6. Add more log messages to aid with task-scheduling debugging in AssignableInstance
>     7. In AbstractTaskDispatcher, for tasks that are STOPPED, TIMED_OUT, TASK_ERROR, the retry logic was newly implemented so that they get re-started correctly
>     8. In AbstractTaskDispatcher, when enforcing overlapAssign for jobs with isAllowOverlapAssignment(), a fix was implemented so that only jobs whose state is IN_PROGRESS are considered
>     9. In AbstractTaskDispatcher, isWorkflowFinished() method was modified so that non-active jobs will have their tasks' resource freed from AssignableInstances to prevent resource leak
>    10. In markJobFailed() and markJobCompleted(), non-active jobs will have their tasks' resource freed from AssignableInstances to prevent resource leak
>    11. Fix the logic so that quotas do not apply to targeted jobs
>    12. Fix TestTaskRebalancer (assumes Consistent Hashing, which is no longer used)
>    13. Fix TestIndependentTaskRebalancer (assumes Consistent Hashing, no longer used)
>    14. Assignment logic was improved so that incomplete tasks whose assigned participants are no longer live will be re-assigned accordingly
>    15. Fix TestTaskRebalanceFailover (tasks on non-live instances will be re-assigned promptly)
>    16. Fix TestRebalanceRunningTask (targeted jobs will get tasks reassigned upon liveInstance and currentState change)
>    17. Fix a bug in FixedAssignmentCalculator and assignment logic for targeted jobs such that a task index will no longer be assigned multiple times
>    18. Fix TestJobFailureTaskNotStarted (tasks were not being assigned at all due to having reached maximum capacity for quota)
>    19. Add targetedTaskConfigMap field in JobConfig to cache TaskConfig objects for targeted tasks to reduce object creation and GC overload
>    20. Fix JobConfig so that it doesn't write quotaType to ZooKeeper when quotaType is null or not set
>    21. Fix deleteWorkflow() in TaskUtil so that the earliest delete failure will render the entire method as failed (and return prematurely to prevent breaking other ZNodes from incomplete deletion)
>    22. Fix TestDeleteWorkflow by adding another removeProperty() clause to lower failure rate



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)