You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2022/09/30 07:20:01 UTC

[GitHub] [dolphinscheduler] wangfann opened a new issue, #12231: [Bug] [service] Some scheduled tasks are not triggered on time

wangfann opened a new issue, #12231:
URL: https://github.com/apache/dolphinscheduler/issues/12231

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
           数仓同学凌晨报告昨日新增任务未按时触发,问题非常严重,严重影响生产,经排查quartz插件默认参数及cronTrigger对象初始化参数在调度平台同一时刻触发大量任务时会导致调度任务超过触发等待阈值从而导致任务未触发。首先默认参数为org.quartz.threadPool.threadCount = 25、org.quartz.jobStore.misfireThreshold = 60000,cronTrigger错失触发处理策略为MISFIRE_INSTRUCTION_DO_NOTHING,即直接抛弃该定时任务。触发任务丢失的条件为同一时刻同时调度大量任务,数仓同学某一时刻批量提交近4000任务(线上2个master)。在调大org.quartz.threadPool.threadCount参数和增加master个数后60s内处理的任务数并没有较大变化(经查看quartz源码,其使用数据库悲观锁来保证并发安全)。因此建议cronTrigger错失触发的处理策略改为直接触发策略(保证系统在处理不过来时任务不丢失,可延迟执行�
 ��,即处理misfire api由withMisfireHandlingInstructionDoNothing改为withMisfireHandlingInstructionFireAndProceed。
         In the early morning, the co-worker from data warehouse reported that the newly added tasks were not triggered on time. The problem was very serious and seriously affected production. After investigation, the default parameters of the quartz plugin and the initialization parameters of the cronTrigger object will cause the scheduling tasks to exceed the trigger waiting threshold when a large number of tasks are triggered at the same time on the scheduling platform.  Causes the task not to trigger.  First, the default parameters are org.quartz.threadPool.threadCount = 25, org.quartz.jobStore.misfireThreshold = 60000, and the cronTrigger miss trigger processing strategy is MISFIRE_INSTRUCTION_DO_NOTHING, that is, the timing task is directly abandoned.  The condition for triggering the loss of tasks is to schedule a large number of tasks at the same time, and the co-worker from data warehouse submit nearly 4,000 tasks in batches (2 online masters) at a certain time.  After incre
 asing the org.quartz.threadPool.threadCount parameter and increasing the number of masters, the number of tasks processed within 60s does not change significantly (after viewing the quartz source code, it uses database pessimistic locks to ensure concurrency safety).  Therefore, it is recommended that the processing strategy of cronTrigger's missed trigger be changed to a direct trigger strategy (to ensure that the task is not lost when the system cannot handle it, and the execution can be delayed), that is, the processing of misfire api is changed from withMisfireHandlingInstructionDoNothing to withMisfireHandlingInstructionFireAndProceed.
   ![image](https://user-images.githubusercontent.com/6930421/193210955-123a5348-e45a-46dc-b3cb-d55fd9ba933b.png)
   
   
   ### What you expected to happen
   
   Change a default strategy, at least ensure that the task is not lost.
   
   ### How to reproduce
   
   A large number of scheduling tasks are triggered at the same time.
   
   ### Anything else
   
   Every version of the project has this probelm.
   
   ### Version
   
   3.0.x
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] davidzollo closed issue #12231: [Bug] [service] Some scheduled tasks are not triggered on time

Posted by GitBox <gi...@apache.org>.
davidzollo closed issue #12231: [Bug] [service] Some scheduled tasks are not triggered on time
URL: https://github.com/apache/dolphinscheduler/issues/12231


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #12231: [Bug] [service] Some scheduled tasks are not triggered on time

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #12231:
URL: https://github.com/apache/dolphinscheduler/issues/12231#issuecomment-1263202368

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
           In the early morning, students from Shucang reported that the newly added tasks were not triggered on time. The problem was very serious and seriously affected production. After investigation, the default parameters of the quartz plugin and the initialization parameters of the cronTrigger object will cause the scheduling tasks to exceed the trigger waiting threshold when a large number of tasks are triggered at the same time on the scheduling platform. Causes the task not to trigger. First, the default parameters are org.quartz.threadPool.threadCount = 25, org.quartz.jobStore.misfireThreshold = 60000, and the cronTrigger miss trigger processing strategy is MISFIRE_INSTRUCTION_DO_NOTHING, that is, the timing task is directly abandoned. The condition for triggering the loss of tasks is to schedule a large number of tasks at the same time, and students from Shucang submit nearly 4,000 tasks in batches (2 online masters) at a certain time. After increasing the org.quartz.threa
 dPool.threadCount parameter and increasing the number of masters, the number of tasks processed within 60s does not change significantly (after viewing the quartz source code, it uses database pessimistic locks to ensure concurrency safety). Therefore, it is recommended that the processing strategy of cronTrigger's missed trigger be changed to a direct trigger strategy (to ensure that the task is not lost when the system cannot handle it, and the execution can be delayed), that is, the processing of misfire api is changed from withMisfireHandlingInstructionDoNothing to withMisfireHandlingInstructionFireAndProceed.
         In the early morning, the co-worker from data warehouse reported that the newly added tasks were not triggered on time. The problem was very serious and seriously affected production. After investigation, the default parameters of the quartz plugin and the initialization parameters of the cronTrigger object will cause the scheduling tasks to exceed the trigger waiting threshold when a large number of tasks are triggered at the same time on the scheduling platform. Causes the task not to trigger. First, the default parameters are org.quartz.threadPool.threadCount = 25, org.quartz.jobStore.misfireThreshold = 60000, and the cronTrigger miss trigger processing strategy is MISFIRE_INSTRUCTION_DO_NOTHING, that is, the timing task is directly abandoned. The condition for triggering the loss of tasks is to schedule a large number of tasks at the same time, and the co-worker from data warehouse submit nearly 4,000 tasks in batches (2 online masters) at a certain time. After increasin
 g the o rg.quartz.threadPool.threadCount parameter and increasing the number of masters, the number of tasks processed within 60s does not change significantly (after viewing the quartz source code, it uses database pessimistic locks to ensure concurrency safety). Therefore, it is recommended that the processing strategy of cronTrigger's missed trigger be changed to a direct trigger strategy (to ensure that the task is not lost when the system cannot handle it, and the execution can be delayed), that is, the processing of misfire api is changed from withMisfireHandlingInstructionDoNothing to withMisfireHandlingInstructionFireAndProceed.
   ![image](https://user-images.githubusercontent.com/6930421/193210955-123a5348-e45a-46dc-b3cb-d55fd9ba933b.png)
   
   
   ### What you expected to happen
   
   Change a default strategy, at least ensure that the task is not lost.
   
   ### How to reproduce
   
   A large number of scheduling tasks are triggered at the same time.
   
   ### Anything else
   
   Every version of the project has this probelm.
   
   ### Version
   
   3.0.x
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #12231: [Bug] [service] Some scheduled tasks are not triggered on time

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #12231:
URL: https://github.com/apache/dolphinscheduler/issues/12231#issuecomment-1263202601

   Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
   * In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
   * If you haven't received a reply for a long time, you can [join our slack](https://s.apache.org/dolphinscheduler-slack) and send your question to channel `#troubleshooting`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org