You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2022/08/13 09:25:27 UTC

[GitHub] [dolphinscheduler] caishunfeng commented on a diff in pull request #11464: [Feature-11463] Increase the dispatch failure retry queue.

caishunfeng commented on code in PR #11464:
URL: https://github.com/apache/dolphinscheduler/pull/11464#discussion_r945118001


##########
dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/consumer/TaskPriorityQueueConsumer.java:
##########
@@ -151,6 +173,10 @@ public List<TaskPriority> batchDispatch(int fetchTaskNum) throws TaskPriorityQue
         List<TaskPriority> failedDispatchTasks = Collections.synchronizedList(new ArrayList<>());
         CountDownLatch latch = new CountDownLatch(fetchTaskNum);
 
+        if (taskPriorityDispatchFailedQueue.size() > 0) {
+            dispatchFailedBackToTaskPriorityQueue(fetchTaskNum);

Review Comment:
   Will it take affect to the performance if handle in the same thread? 



##########
dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/consumer/TaskPriorityQueueConsumer.java:
##########
@@ -176,6 +203,41 @@ public List<TaskPriority> batchDispatch(int fetchTaskNum) throws TaskPriorityQue
         return failedDispatchTasks;
     }
 
+    /**
+     * put the failed dispatch task into the dispatch queue again
+     */
+    private void dispatchFailedBackToTaskPriorityQueue(int fetchTaskNum) throws TaskPriorityQueueException, InterruptedException {
+        try {
+            for (int i = 0; i < fetchTaskNum; i++) {
+                TaskPriority dispatchFailedTaskPriority = taskPriorityDispatchFailedQueue.poll(Constants.SLEEP_TIME_MILLIS, TimeUnit.MILLISECONDS);
+                if (Objects.isNull(dispatchFailedTaskPriority)){
+                    continue;
+                }
+                if (canRetry(dispatchFailedTaskPriority)){
+                    dispatchFailedTaskPriority.setDispatchFailedRetryTimes(dispatchFailedTaskPriority.getDispatchFailedRetryTimes() + 1);
+                    taskPriorityQueue.put(dispatchFailedTaskPriority);
+                } else {
+                    taskPriorityDispatchFailedQueue.put(dispatchFailedTaskPriority);

Review Comment:
   If can not retry, I think it should be removed from fail queue, otherwise this fail queue will keep growing and not release.



##########
dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/consumer/TaskPriorityQueueConsumer.java:
##########
@@ -176,6 +203,41 @@ public List<TaskPriority> batchDispatch(int fetchTaskNum) throws TaskPriorityQue
         return failedDispatchTasks;
     }
 
+    /**
+     * put the failed dispatch task into the dispatch queue again
+     */
+    private void dispatchFailedBackToTaskPriorityQueue(int fetchTaskNum) throws TaskPriorityQueueException, InterruptedException {
+        try {
+            for (int i = 0; i < fetchTaskNum; i++) {
+                TaskPriority dispatchFailedTaskPriority = taskPriorityDispatchFailedQueue.poll(Constants.SLEEP_TIME_MILLIS, TimeUnit.MILLISECONDS);
+                if (Objects.isNull(dispatchFailedTaskPriority)){
+                    continue;
+                }
+                if (canRetry(dispatchFailedTaskPriority)){
+                    dispatchFailedTaskPriority.setDispatchFailedRetryTimes(dispatchFailedTaskPriority.getDispatchFailedRetryTimes() + 1);
+                    taskPriorityQueue.put(dispatchFailedTaskPriority);
+                } else {
+                    taskPriorityDispatchFailedQueue.put(dispatchFailedTaskPriority);
+                }
+            }
+        } catch (Exception e) {
+            logger.error("dispatch failed back to task priority queue error", e);
+        }
+    }
+
+    /**
+     * the time interval is adjusted according to the number of retries
+     */
+    private boolean canRetry (TaskPriority taskPriority){
+        int dispatchFailedRetryTimes = taskPriority.getDispatchFailedRetryTimes();
+        long now = System.currentTimeMillis();
+        // retry more than 100 times with 100 seconds delay each time
+        if (dispatchFailedRetryTimes >= Constants.DEFAULT_MAX_RETRY_COUNT){
+            return now - taskPriority.getLastDispatchTime() >= TIME_DELAY[Constants.DEFAULT_MAX_RETRY_COUNT];

Review Comment:
   will it out of bounds if use `TIME_DELAY[Constants.DEFAULT_MAX_RETRY_COUNT]`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org