You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@shardingsphere.apache.org by GitBox <gi...@apache.org> on 2020/10/09 02:50:33 UTC

[GitHub] [shardingsphere-elasticjob] sparrowzoo opened a new issue #1549: ej 在实例假死的情况下,即使开启monitor execution 也无法保证同一个分片在同一时刻只在一个实例上运行。

sparrowzoo opened a new issue #1549:
URL: https://github.com/apache/shardingsphere-elasticjob/issues/1549


    public void shardingIfNecessary() {
           List<JobInstance> availableJobInstances = instanceService.getAvailableJobInstances();
           if (!isNeedSharding() || availableJobInstances.isEmpty()) {
               return;
           }
           
           if (!leaderService.isLeaderUntilBlock()) {
               //分片确认已经结束
               blockUntilShardingCompleted();
               return;
           }
           //等待所有任务节点结束/running 临时节点(存在假死的可能)
           waitingOtherShardingItemCompleted();
           LiteJobConfiguration liteJobConfig = configService.load(false);
           int shardingTotalCount = liteJobConfig.getTypeConfig().getCoreConfig().getShardingTotalCount();
           log.debug("Job '{}' sharding begin.", jobName);
           jobNodeStorage.fillEphemeralJobNode(ShardingNode.PROCESSING, "");
           resetShardingInfo(shardingTotalCount);
           JobShardingStrategy jobShardingStrategy = JobShardingStrategyFactory.getStrategy(liteJobConfig.getJobShardingStrategyClass());
           jobNodeStorage.executeInTransaction(new PersistShardingInfoTransactionExecutionCallback(jobShardingStrategy.sharding(availableJobInstances, jobName, shardingTotalCount)));
           log.debug("Job '{}' sharding complete.", jobName);
       }
   
   
    private void waitingOtherShardingItemCompleted() {
           while (executionService.hasRunningItems()) {
               log.debug("Job '{}' sleep short time until other job completed.", jobName);
               BlockUtils.waitingShortTime();
           }
       }


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [shardingsphere-elasticjob] terrymanu commented on issue #1549: when elastic job instance is suspended(can't communicate with leader,but not dead), even if "monitorExecution" is true, it cannot guarantee that the same sharding will only run on one instance at the same time

Posted by GitBox <gi...@apache.org>.
terrymanu commented on issue #1549:
URL: https://github.com/apache/shardingsphere-elasticjob/issues/1549#issuecomment-705953764


   I can't get your point, can you explain the reason to open this issue?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [shardingsphere-elasticjob] terrymanu edited a comment on issue #1549: when elastic job instance is suspended(can't communicate with leader,but not dead), even if "monitorExecution" is true, it cannot guarantee that the same sharding will only run on one instance at the same time

Posted by GitBox <gi...@apache.org>.
terrymanu edited a comment on issue #1549:
URL: https://github.com/apache/shardingsphere-elasticjob/issues/1549#issuecomment-705953764


   I can't get your point, can you explain the context and reason to open this issue?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [shardingsphere-elasticjob] terrymanu commented on issue #1549: when elastic job instance is suspended(can't communicate with leader,but not dead), even if "monitorExecution" is true, it cannot guarantee that the same sharding will only run on one instance at the same time

Posted by GitBox <gi...@apache.org>.
terrymanu commented on issue #1549:
URL: https://github.com/apache/shardingsphere-elasticjob/issues/1549#issuecomment-723524684


   The job will be paused if the instance can not communicate with reg center.
   Any problem with it?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [shardingsphere-elasticjob] Wzy19930507 commented on issue #1549: when elastic job instance is suspended(can't communicate with leader,but not dead), even if "monitorExecution" is true, it cannot guarantee that the same sharding will only run on one instance at the same time

Posted by GitBox <gi...@apache.org>.
Wzy19930507 commented on issue #1549:
URL: https://github.com/apache/shardingsphere-elasticjob/issues/1549#issuecomment-1054356804


   @terrymanu   幂等失效场景复现
   
   1. 使用 idea:idea debug 打断点时会暂停所有线程包括,导致zk无法继续发送心跳
   2. 三个分片,开启幂等,idea启动三个实例:
   3. 其中一个实例打断点,位置如下:
   ```
   public class StudyMonitorExecutionJob implements SimpleJob {
   
     private int i = 0;
   
     @Override
     public void execute(ShardingContext shardingContext) {
         // 其中一个实例在 这里打断点
         log.info("StudySimpleJob ShardingItem: {}", shardingContext.getShardingItem());
         i++;
         try {
           Thread.sleep(20_000);
         } catch (InterruptedException e) {
           log.info("StudySimpleJob ShardingItem ------------------- down1");
         }
       }
     }
   }
   ```
   4. 经过一段时间后,打断点实例掉线,其他实例重新分片
   5. 重新分片之后,打断的实例继续执行,可以继续执行任务,幂等失效。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@shardingsphere.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [shardingsphere-elasticjob] Wzy19930507 edited a comment on issue #1549: when elastic job instance is suspended(can't communicate with leader,but not dead), even if "monitorExecution" is true, it cannot guarantee that the same sharding will only run on one instance at the same time

Posted by GitBox <gi...@apache.org>.
Wzy19930507 edited a comment on issue #1549:
URL: https://github.com/apache/shardingsphere-elasticjob/issues/1549#issuecomment-1054356804


   @terrymanu  elastic-job-lite 幂等失效场景复现
   
   1. 开发工具 idea,idea debug 经过断点时会暂停所有线程,导致zk无法发送心跳。
   2. 三个分片,开启幂等,idea启动三个实例:
   3. 其中一个实例打断点,位置如下:
   ```
   public class StudyMonitorExecutionJob implements SimpleJob {
   
     private int i = 0;
   
     @Override
     public void execute(ShardingContext shardingContext) {
         // 其中一个实例在 这里打断点
         log.info("StudySimpleJob ShardingItem: {}", shardingContext.getShardingItem());
         i++;
         try {
           Thread.sleep(20_000);
         } catch (InterruptedException e) {
           log.info("StudySimpleJob ShardingItem ------------------- down1");
         }
       }
     }
   }
   ```
   4. 经过一段时间后,打断点的实例掉线,其他实例重新分片
   5. 重新分片之后,打断点的实例继续执行任务,幂等失效。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@shardingsphere.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [shardingsphere-elasticjob] sparrowzoo commented on issue #1549: when elastic job instance is suspended(can't communicate with leader,but not dead), even if "monitorExecution" is true, it cannot guarantee that the same sharding will only run on one instance at the same time

Posted by GitBox <gi...@apache.org>.
sparrowzoo commented on issue #1549:
URL: https://github.com/apache/shardingsphere-elasticjob/issues/1549#issuecomment-843133163


   > The job will be paused if the instance can not communicate with reg center.
   > Any problem with it?
   
   “job paused ” means business thread  of sharding instance is stopped?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [shardingsphere-elasticjob] terrymanu commented on issue #1549: when elastic job instance is suspended(can't communicate with leader,but not dead), even if "monitorExecution" is true, it cannot guarantee that the same sharding will only run on one instance at the same time

Posted by GitBox <gi...@apache.org>.
terrymanu commented on issue #1549:
URL: https://github.com/apache/shardingsphere-elasticjob/issues/1549#issuecomment-705953764


   I can't get your point, can you explain the reason to open this issue?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [shardingsphere-elasticjob] sparrowzoo commented on issue #1549: when elastic job instance is suspended(can't communicate with leader,but not dead), even if "monitorExecution" is true, it cannot guarantee that the same sharding will only run on one instance at the same time

Posted by GitBox <gi...@apache.org>.
sparrowzoo commented on issue #1549:
URL: https://github.com/apache/shardingsphere-elasticjob/issues/1549#issuecomment-705970133


   then context is : 
   t1. when instanct 'A' can't  community with leader,but not dead. this task is continue running.... `may be continue long time`
   t2. trigger resharding flag
   t3. resharding while assign the sharding of instant 'A' to another instance)
   ```
   //zk ephemeral node
   private void waitingOtherShardingItemCompleted() {
   while (executionService.hasRunningItems()) {
   log.debug("Job '{}' sleep short time until other job completed.", jobName);
   BlockUtils.waitingShortTime();
   }
   }
   ```
   t4.  job start running
   t5.  sharding of instant 'A' is repeat running 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [shardingsphere-elasticjob] Wzy19930507 edited a comment on issue #1549: when elastic job instance is suspended(can't communicate with leader,but not dead), even if "monitorExecution" is true, it cannot guarantee that the same sharding will only run on one instance at the same time

Posted by GitBox <gi...@apache.org>.
Wzy19930507 edited a comment on issue #1549:
URL: https://github.com/apache/shardingsphere-elasticjob/issues/1549#issuecomment-1054356804


   @terrymanu  elastic-job-lite 幂等失效场景复现
   
   1. 开发工具 idea,idea debug 经过断点时会暂停所有线程,导致zk无法发送心跳。
   2. 三个分片,开启幂等,idea启动三个实例:
   3. shardingitem==1的实例打断点,位置如下:
   ```
   public class StudyMonitorExecutionJob implements SimpleJob {
   
     private int i = 0;
   
     @Override
     public void execute(ShardingContext shardingContext) {
         // 这里打断点
         log.info("StudySimpleJob ShardingItem: {}", shardingContext.getShardingItem());
         i++;
         try {
           Thread.sleep(20_000);
         } catch (InterruptedException e) {
           log.info("StudySimpleJob ShardingItem ------------------- down1");
         }
       }
     }
   }
   ```
   4. 经过一段时间后,打断点的实例掉线,其他实例重新分片
   5. 重新分片之后,打断点的实例继续执行任务,此时有两个实例执行 shardingitem==1 的任务,导致短暂的幂等失效。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@shardingsphere.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [shardingsphere-elasticjob] sparrowzoo commented on issue #1549: when elastic job instance is suspended(can't communicate with leader,but not dead), even if "monitorExecution" is true, it cannot guarantee that the same sharding will only run on one instance at the same time

Posted by GitBox <gi...@apache.org>.
sparrowzoo commented on issue #1549:
URL: https://github.com/apache/shardingsphere-elasticjob/issues/1549#issuecomment-705970133


   then context is : 
   t1. when instanct 'A' can't  community with leader,but not dead. this task is continue running.... `may be continue long time`
   t2. trigger resharding flag
   t3. resharding while assign the sharding of instant 'A' to another instance)
   ```
   //zk ephemeral node
   private void waitingOtherShardingItemCompleted() {
   while (executionService.hasRunningItems()) {
   log.debug("Job '{}' sleep short time until other job completed.", jobName);
   BlockUtils.waitingShortTime();
   }
   }
   ```
   t4.  job start running
   t5.  sharding of instant 'A' is repeat running 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [shardingsphere-elasticjob] terrymanu edited a comment on issue #1549: when elastic job instance is suspended(can't communicate with leader,but not dead), even if "monitorExecution" is true, it cannot guarantee that the same sharding will only run on one instance at the same time

Posted by GitBox <gi...@apache.org>.
terrymanu edited a comment on issue #1549:
URL: https://github.com/apache/shardingsphere-elasticjob/issues/1549#issuecomment-705953764


   I can't get your point, can you explain the context and reason to open this issue?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [shardingsphere-elasticjob] Wzy19930507 edited a comment on issue #1549: when elastic job instance is suspended(can't communicate with leader,but not dead), even if "monitorExecution" is true, it cannot guarantee that the same sharding will only run on one instance at the same time

Posted by GitBox <gi...@apache.org>.
Wzy19930507 edited a comment on issue #1549:
URL: https://github.com/apache/shardingsphere-elasticjob/issues/1549#issuecomment-1054356804


   @terrymanu @TeslaCN  elastic-job-lite 幂等失效场景复现
   
   1. 开发工具 idea,idea debug 经过断点时会暂停所有线程,导致zk无法发送心跳。
   2. 三个分片,开启幂等,idea启动三个实例:
   3. shardingitem==1的实例打断点,位置如下:
   ```
   public class StudyMonitorExecutionJob implements SimpleJob {
   
     private int i = 0;
   
     @Override
     public void execute(ShardingContext shardingContext) {
         // 这里打断点
         log.info("StudySimpleJob ShardingItem: {}", shardingContext.getShardingItem());
         i++;
         try {
           Thread.sleep(20_000);
         } catch (InterruptedException e) {
           log.info("StudySimpleJob ShardingItem ------------------- down1");
         }
       }
     }
   }
   ```
   4. 经过一段时间后,打断点的实例掉线,其他实例重新分片
   5. 重新分片之后,打断点的实例继续执行任务,此时有两个实例执行 shardingitem==1 的任务,导致短暂的幂等失效。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@shardingsphere.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org