You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2020/11/19 07:46:45 UTC

[GitHub] [incubator-dolphinscheduler] lenboo opened a new issue #4083: [Improvement][Master]Master improvement, less thread consumption, less time consumption

lenboo opened a new issue #4083:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/4083


   *For better global communication, please give priority to using English description, thx! *
   
   improvement for master
   
   ![image](https://user-images.githubusercontent.com/29528966/99616759-c70def80-2a58-11eb-9661-f8f0012354f5.png)
   
   At present, the problems of master:
   
   1. There are many polling, that result in unnecessary time-cost
   
   2. The distributed lock is used when the command is taken, that result in the bottleneck of concurrency
   
   3. Too many threads(nProcessInstance*nTaskInstances) are used, that result in the waste of system resources
   
   4. Polling database result in database query pressure bottleneck
   
   
   
   Master模块优化:
   目前的master:
   
   ![image](https://user-images.githubusercontent.com/29528966/99615188-ea836b00-2a55-11eb-8fde-0b7bf7af91ca.png)
   
   目前发现master的问题:
   1.  出现比较多的轮询,造成不必要的耗时
   2.  取command的时候使用了分布式锁,造成并发数量的瓶颈
   3.  线程使用过多(nProcessInstance*nTaskInstances) ,造成系统资源浪费的现象
   4.  轮询数据库,数据量大的情况下,造成数据库查询压力瓶颈
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] lenboo commented on issue #4083: [Improvement][Master]Master improvement, less thread consumption, less time consumption

Posted by GitBox <gi...@apache.org>.
lenboo commented on issue #4083:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/4083#issuecomment-733631611


   some idea about less thread number:
   ![image](https://user-images.githubusercontent.com/29528966/100216364-2a15ef80-2f4d-11eb-8896-7d96d9811a32.png)
   
   ![image](https://user-images.githubusercontent.com/29528966/100218107-3dc25580-2f4f-11eb-94a0-fb5ca1cb9e52.png)
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] lenboo edited a comment on issue #4083: [Improvement][Master]Master improvement, less thread consumption, less time consumption

Posted by GitBox <gi...@apache.org>.
lenboo edited a comment on issue #4083:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/4083#issuecomment-730826151


   - The distributed lock is used when the command is taken, that result in the bottleneck of concurrency
   
   another idea to solve this problem:
   ![image](https://user-images.githubusercontent.com/29528966/99754888-592efa00-2b24-11eb-95c8-3508b1949c6c.png)
   
   Scheduler(active/stand by):
   1. Calculate the distribution policy according to the master resource usage
   2. Generate processinstance
   3. Send processinstance to master according to the policy
   
   Master fault tolerance:
   1. Other masters change PI to fault tolerant state
   2. Insert fault tolerant command
   3. The scheduler gets the commands and sends pi to the master according to policy (priority... )
   
   =======================================================================
   另一个解决思路:
   Scheduler:
   1. 根据master资源占用,计算分发策略
   2. 获取command生成ProcessInstance 
   3. 根据策略(优先级。。。),发送ProcessInstance给master
   
   master容错:
   1. 其他master将PI改为容错状态
   2. 插入容错command
   3. scheduler获取command,发PI给master
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] lenboo commented on issue #4083: [Improvement][Master]Master improvement, less thread consumption, less time consumption

Posted by GitBox <gi...@apache.org>.
lenboo commented on issue #4083:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/4083#issuecomment-730826151


   - The distributed lock is used when the command is taken, that result in the bottleneck of concurrency
   
   another idea to solve this problem:
   ![image](https://user-images.githubusercontent.com/29528966/99754888-592efa00-2b24-11eb-95c8-3508b1949c6c.png)
   
   Scheduler(master/slave):
   1. Calculate the distribution policy according to the master resource usage
   2. Generate processinstance
   3. Send processinstance to master according to the policy
   
   Master fault tolerance:
   1. Other masters change PI to fault tolerant state
   2. Insert fault tolerant command
   3. The scheduler gets the commands and sends pi to the master according to policy (priority... )
   
   =======================================================================
   另一个解决思路:
   Scheduler:
   1. 根据master资源占用,计算分发策略
   2. 获取command生成ProcessInstance 
   3. 根据策略(优先级。。。),发送ProcessInstance给master
   
   master容错:
   1. 其他master将PI改为容错状态
   2. 插入容错command
   3. scheduler获取command,发PI给master
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] lenboo commented on issue #4083: [Improvement][Master]Master improvement, less thread consumption, less time consumption

Posted by GitBox <gi...@apache.org>.
lenboo commented on issue #4083:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/4083#issuecomment-730823079


   > I think we can use zk-watch to monitor. At the same time, in order to avoid the zk pressure caused by frequent node changes, we need to limit the current. For example, to define a time window, when a large number of events are triggered, we guarantee that it will only be sent once per second. In this way we can remove related threads and avoid unnecessary polling. But this also has a problem, that is, the zk notification is a serial process, which will cause our master node to cause a serious tilt. Therefore, we may also need to deal with the master. For example, the relevant design of the model.
   > 
   > 我认为我们可以采用zk-watch来进行监听,同时,为了避免节点变更频繁所导致的zk压力,我们需要对其进行限流。比如定义一个时间窗口,大量事件触发的时候我们保证1s只会发送一次。这样我们可以去掉相关的线程以及避免无谓的轮询。但这同样存在一个问题,即zk通知是一个串行的过程,他会导致我们的master节点造成严重倾斜。因此,我们可能还需要对master进行相关处理。比如取模的相关设计。
   
   good idea! 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] CalvinKirs commented on issue #4083: [Improvement][Master]Master improvement, less thread consumption, less time consumption

Posted by GitBox <gi...@apache.org>.
CalvinKirs commented on issue #4083:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/4083#issuecomment-730320583


   I think we can use zk-watch to monitor. At the same time, in order to avoid the zk pressure caused by frequent node changes, we need to limit the current. For example, to define a time window, when a large number of events are triggered, we guarantee that it will only be sent once per second. In this way we can remove related threads and avoid unnecessary polling. But this also has a problem, that is, the zk notification is a serial process, which will cause our master node to cause a serious tilt. Therefore, we may also need to deal with the master. For example, the relevant design of the model.
   
   我认为我们可以采用zk-watch来进行监听,同时,为了避免节点变更频繁所导致的zk压力,我们需要对其进行限流。比如定义一个时间窗口,大量事件触发的时候我们保证1s只会发送一次。这样我们可以去掉相关的线程以及避免无谓的轮询。但这同样存在一个问题,即zk通知是一个串行的过程,他会导致我们的master节点造成严重倾斜。因此,我们可能还需要对master进行相关处理。比如取模的相关设计。


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org