You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@shardingsphere.apache.org by "sparrowzoo (via GitHub)" <gi...@apache.org> on 2023/04/25 09:00:37 UTC

[GitHub] [shardingsphere-elasticjob] sparrowzoo commented on issue #1549: when elastic job instance is suspended(can't communicate with leader,but not dead), even if "monitorExecution" is true, it cannot guarantee that the same sharding will only run on one instance at the same time

sparrowzoo commented on issue #1549:
URL: https://github.com/apache/shardingsphere-elasticjob/issues/1549#issuecomment-1521429262

   > @terrymanu @TeslaCN elastic-job-lite 幂等失效场景复现
   > 
   > 1. 开发工具 idea,idea debug 经过断点时会暂停所有线程,导致zk无法发送心跳。
   > 2. 三个分片,开启幂等,idea启动三个实例:
   > 3. shardingitem==1的实例打断点,位置如下:
   > 
   > ```
   > public class StudyMonitorExecutionJob implements SimpleJob {
   > 
   >   private int i = 0;
   > 
   >   @Override
   >   public void execute(ShardingContext shardingContext) {
   >       // 这里打断点
   >       log.info("StudySimpleJob ShardingItem: {}", shardingContext.getShardingItem());
   >       i++;
   >       try {
   >         Thread.sleep(20_000);
   >       } catch (InterruptedException e) {
   >         log.info("StudySimpleJob ShardingItem ------------------- down1");
   >       }
   >     }
   >   }
   > }
   > ```
   > 
   > 4. 经过一段时间后,打断点的实例掉线,其他实例重新分片
   > 5. 重新分片之后,打断点的实例继续执行任务,此时有两个实例执行 shardingitem==1 的任务,导致短暂的幂等失效。
   
   这种情况模拟stw,理论上可以复现的。
   有个建议大家讨论哈:
   **设:任务节点和zk的心跳时间为2秒**
   
   可以考虑在任务节点维护一个监控线程(维护心跳,比如2秒),监控和zk的通信状态,
   那么如果监控线程在5秒内没有收到zk 的响应,则此时 临时节点已经被删除,同时启动重新分片标记。
   此时该监控线程可以将本地任务线程杀死! 
   然后主节点再等待N秒后(>5秒)再执行重分片。
   
   就可以保证同一节点在集群范围内不会被重复执行
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@shardingsphere.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org