You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2021/10/27 10:11:01 UTC

[GitHub] [dolphinscheduler] caishunfeng opened a new issue #6616: [Bug] [Worker] Worker fakes death when it stop itself fail.

caishunfeng opened a new issue #6616:
URL: https://github.com/apache/dolphinscheduler/issues/6616


   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
   When I try a stress test, I found that worker fakes death and print nothing to log file. At the same time, worker is not exist in zk node path and Master can't dispatch task because no worker.
   
   That's the log before worker stop:
   '''
   [ERROR] 2021-10-15 15:10:57.590 org.apache.dolphinscheduler.server.worker.WorkerServer:[223] - worker server stop exception 
   org.apache.dolphinscheduler.spi.register.RegistryException: zookeeper delete key error
           at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.delete(ZookeeperRegistry.java:272)
           at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.remove(ZookeeperRegistry.java:199)
           at org.apache.dolphinscheduler.service.registry.RegistryCenter.remove(RegistryCenter.java:157)
           at org.apache.dolphinscheduler.server.worker.registry.WorkerRegistryClient.unRegistry(WorkerRegistryClient.java:128)
           at org.apache.dolphinscheduler.server.worker.WorkerServer.close(WorkerServer.java:219)
           at org.apache.dolphinscheduler.server.worker.WorkerServer.stop(WorkerServer.java:229)
           at org.apache.dolphinscheduler.server.registry.HeartBeatTask.run(HeartBeatTask.java:81)
           at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
           at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
           at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /dolphinscheduler/nodes/worker/default/172.28.132.15:1234
           at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
           at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
           at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:882)
           at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:274)
           at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:268)
           at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:67)
           at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:81)
           at org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:265)
           at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:249)
           at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:34)
           at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.delete(ZookeeperRegistry.java:267)
           ... 13 common frames omitted
   '''
   
   Maybe set stop single to true after close zk and netty successfully is better.
   
   ### What you expected to happen
   
   worker can stop itself successfully when it is judged dead server.
   
   ### How to reproduce
   
   do some stress test with many task running.
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #6616: [Bug] [Worker] Worker fakes death when it stop itself fail.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #6616:
URL: https://github.com/apache/dolphinscheduler/issues/6616#issuecomment-952760859


   Hi:
   * Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
   * In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
   * If you haven't received a reply for a long time, you can subscribe to the developer's email,Mail subscription steps reference https://dolphinscheduler.apache.org/zh-cn/community/development/subscribe.html ,Then write the issue URL in the email content and send question to dev@dolphinscheduler.apache.org.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] lenboo closed issue #6616: [Bug] [Worker] Worker fakes death when it stop itself fail.

Posted by GitBox <gi...@apache.org>.
lenboo closed issue #6616:
URL: https://github.com/apache/dolphinscheduler/issues/6616


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org