You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2022/03/18 06:11:36 UTC

[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #8980: [Bug] [Master] master can repeat processing command

github-actions[bot] commented on issue #8980:
URL: https://github.com/apache/dolphinscheduler/issues/8980#issuecomment-1072063542


   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
   Question one : If master online or offline, the other active masters will execute the `updateMasterNodes` method, first initialize param of `MASTER_SLOT` to zero, and then get lock from zookeeper and execute `syncMasterNodes` method serially, it initialize param of `MASTER_SIZE` and `MASTER_SLOT`.  
   so, during the get lock, the `MASTER_SIZE` and `MASTER_SLOT` of each master is valid and has the same value, so the scan command is likely to be repeated. Even if there is a double-check slot in `command2ProcessInstance`, if it cannot be changed before double-check, it will be repeated.
   
   (如果master上线或下线,会通知其他master执行updateMasterNodes方法,首先会初始化MASTER_SLOT=0 (0是有效值),然后串行争夺zk锁,抢到锁的才会执行syncMasterNodes,这里才会更改MASTER_SIZE。也就是为抢到锁的其他master的MASTER_SIZE相同,MASTER_SLOT都为0,所以此时扫描的command应该都是一样的。虽然再转为processintance之前有二次校验,但是此时如果还没获取到锁,其实值没变,依旧校验通过,所以会多个master会处理同一command)
   
   ![image](https://user-images.githubusercontent.com/29919212/158941329-15ec042f-eab7-48cd-a048-cf0623a7078a.png)
   
   ![image](https://user-images.githubusercontent.com/29919212/158941383-a39898ed-79b7-4890-bd36-42db749b9074.png)
   
   
   Question two: Because the timing of master processing is different, and the data will be deleted when completed, so the master query command that will skip part of the data when getting the next page. 
   
   ![image](https://user-images.githubusercontent.com/29919212/158946899-08914c95-79c8-4a26-bbf9-878f334a3f31.png)
   
   
   ### What you expected to happen
   
   No 1. Do not repeat processing command
   No 2. As much as possible, ensure that the order of process instances is generated
   
   ### How to reproduce
   
   I am so sorry,I don't have enough nodes to test, I can provide a plan. 
   1. Prepare four master nodes
   2. Sleep for 30 seconds after acquiring the lock
   3. Large number of inserts command
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   dev
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org