You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2022/09/30 07:20:30 UTC

[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #12231: [Bug] [service] Some scheduled tasks are not triggered on time

github-actions[bot] commented on issue #12231:
URL: https://github.com/apache/dolphinscheduler/issues/12231#issuecomment-1263202368

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
           In the early morning, students from Shucang reported that the newly added tasks were not triggered on time. The problem was very serious and seriously affected production. After investigation, the default parameters of the quartz plugin and the initialization parameters of the cronTrigger object will cause the scheduling tasks to exceed the trigger waiting threshold when a large number of tasks are triggered at the same time on the scheduling platform. Causes the task not to trigger. First, the default parameters are org.quartz.threadPool.threadCount = 25, org.quartz.jobStore.misfireThreshold = 60000, and the cronTrigger miss trigger processing strategy is MISFIRE_INSTRUCTION_DO_NOTHING, that is, the timing task is directly abandoned. The condition for triggering the loss of tasks is to schedule a large number of tasks at the same time, and students from Shucang submit nearly 4,000 tasks in batches (2 online masters) at a certain time. After increasing the org.quartz.threa
 dPool.threadCount parameter and increasing the number of masters, the number of tasks processed within 60s does not change significantly (after viewing the quartz source code, it uses database pessimistic locks to ensure concurrency safety). Therefore, it is recommended that the processing strategy of cronTrigger's missed trigger be changed to a direct trigger strategy (to ensure that the task is not lost when the system cannot handle it, and the execution can be delayed), that is, the processing of misfire api is changed from withMisfireHandlingInstructionDoNothing to withMisfireHandlingInstructionFireAndProceed.
         In the early morning, the co-worker from data warehouse reported that the newly added tasks were not triggered on time. The problem was very serious and seriously affected production. After investigation, the default parameters of the quartz plugin and the initialization parameters of the cronTrigger object will cause the scheduling tasks to exceed the trigger waiting threshold when a large number of tasks are triggered at the same time on the scheduling platform. Causes the task not to trigger. First, the default parameters are org.quartz.threadPool.threadCount = 25, org.quartz.jobStore.misfireThreshold = 60000, and the cronTrigger miss trigger processing strategy is MISFIRE_INSTRUCTION_DO_NOTHING, that is, the timing task is directly abandoned. The condition for triggering the loss of tasks is to schedule a large number of tasks at the same time, and the co-worker from data warehouse submit nearly 4,000 tasks in batches (2 online masters) at a certain time. After increasin
 g the o rg.quartz.threadPool.threadCount parameter and increasing the number of masters, the number of tasks processed within 60s does not change significantly (after viewing the quartz source code, it uses database pessimistic locks to ensure concurrency safety). Therefore, it is recommended that the processing strategy of cronTrigger's missed trigger be changed to a direct trigger strategy (to ensure that the task is not lost when the system cannot handle it, and the execution can be delayed), that is, the processing of misfire api is changed from withMisfireHandlingInstructionDoNothing to withMisfireHandlingInstructionFireAndProceed.
   ![image](https://user-images.githubusercontent.com/6930421/193210955-123a5348-e45a-46dc-b3cb-d55fd9ba933b.png)
   
   
   ### What you expected to happen
   
   Change a default strategy, at least ensure that the task is not lost.
   
   ### How to reproduce
   
   A large number of scheduling tasks are triggered at the same time.
   
   ### Anything else
   
   Every version of the project has this probelm.
   
   ### Version
   
   3.0.x
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org