You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2021/12/13 06:23:24 UTC

[GitHub] [dolphinscheduler] zwZjut opened a new issue #7361: [Bug] [dolphinscheduler-server] MasterServer not die as expected

zwZjut opened a new issue #7361:
URL: https://github.com/apache/dolphinscheduler/issues/7361


   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
   the process of MasterServer is still active, but  it should be stopped.
   now it need  manual restart.
   
   logs:
   [INFO] 2021-12-13 12:14:51.254 org.apache.dolphinscheduler.server.master.MasterServer:[196] - master server is stopping ..., cause : i was judged to death, release resources and stop myself
   /opt/apache-dolphinscheduler-2.0.1-bin/logs # grep -A 10000 'master server is stopping' dolphinscheduler-master.log
   [INFO] 2021-12-13 12:14:51.254 org.apache.dolphinscheduler.server.master.MasterServer:[196] - master server is stopping ..., cause : i was judged to death, release resources and stop myself
   [WARN] 2021-12-13 12:14:51.570 org.apache.dolphinscheduler.server.master.dispatch.host.LowerWeightHostManager:[161] - worker honghuo-worker-1.honghuo-worker-headless:1234 current cpu load average 11.18 is too high or available memory 8.08G is too low
   [WARN] 2021-12-13 12:14:51.571 org.apache.dolphinscheduler.server.master.dispatch.host.LowerWeightHostManager:[161] - worker honghuo-worker-0.honghuo-worker-headless:1234 current cpu load average 9.54 is too high or available memory 6.27G is too low
   [WARN] 2021-12-13 12:14:52.571 org.apache.dolphinscheduler.server.master.dispatch.host.LowerWeightHostManager:[161] - worker honghuo-worker-1.honghuo-worker-headless:1234 current cpu load average 10.77 is too high or available memory 8.17G is too low
   [WARN] 2021-12-13 12:14:52.572 org.apache.dolphinscheduler.server.master.dispatch.host.LowerWeightHostManager:[161] - worker honghuo-worker-0.honghuo-worker-headless:1234 current cpu load average 9.54 is too high or available memory 6.27G is too low
   [WARN] 2021-12-13 12:14:53.572 org.apache.dolphinscheduler.server.master.dispatch.host.LowerWeightHostManager:[161] - worker honghuo-worker-1.honghuo-worker-headless:1234 current cpu load average 10.77 is too high or available memory 8.17G is too low
   [WARN] 2021-12-13 12:14:53.573 org.apache.dolphinscheduler.server.master.dispatch.host.LowerWeightHostManager:[161] - worker honghuo-worker-0.honghuo-worker-headless:1234 current cpu load average 9.54 is too high or available memory 6.27G is too low
   [INFO] 2021-12-13 12:14:54.260 org.apache.dolphinscheduler.remote.NettyRemotingClient:[390] - netty client closed
   [INFO] 2021-12-13 12:14:54.263 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[159] - master schedule service stopped...
   [INFO] 2021-12-13 12:14:54.265 org.apache.dolphinscheduler.remote.NettyRemotingServer:[243] - netty server closed
   [INFO] 2021-12-13 12:14:54.270 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[437] - master node : honghuo-master-1.honghuo-master-headless:5678 unRegistry to register center.
   [INFO] 2021-12-13 12:14:54.270 org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[286] - master node : /nodes/master/honghuo-master-1.honghuo-master-headless:5678 down.
   [INFO] 2021-12-13 12:14:54.271 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[439] - heartbeat executor shutdown
   [INFO] 2021-12-13 12:14:54.274 org.apache.curator.framework.imps.CuratorFrameworkImpl:[955] - backgroundOperationsLoop exiting
   [ERROR] 2021-12-13 12:14:54.274 org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[307] - update master nodes error
   org.apache.dolphinscheduler.registry.api.RegistryException: zookeeper release lock error
   	at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.acquireLock(ZookeeperRegistry.java:214)
   	at org.apache.dolphinscheduler.service.registry.RegistryClient.getLock(RegistryClient.java:237)
   	at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager.updateMasterNodes(ServerNodeManager.java:302)
   	at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager.access$700(ServerNodeManager.java:67)
   	at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager$MasterDataListener.notify(ServerNodeManager.java:287)
   	at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.lambda$subscribe$1(ZookeeperRegistry.java:127)
   	at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:760)
   	at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:754)
   	at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100)
   	at org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
   	at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92)
   	at org.apache.curator.framework.recipes.cache.TreeCache.callListeners(TreeCache.java:753)
   	at org.apache.curator.framework.recipes.cache.TreeCache.access$1900(TreeCache.java:75)
   	at org.apache.curator.framework.recipes.cache.TreeCache$4.run(TreeCache.java:865)
   	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.IOException: Lost connection while trying to acquire lock: /lock/masters
   	at org.apache.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:91)
   	at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.acquireLock(ZookeeperRegistry.java:203)
   	... 18 common frames omitted
   [ERROR] 2021-12-13 12:14:54.275 org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[291] - MasterNodeListener capture data change and get data failed.
   java.lang.NullPointerException: null
   	at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.releaseLock(ZookeeperRegistry.java:221)
   	at org.apache.dolphinscheduler.service.registry.RegistryClient.releaseLock(RegistryClient.java:241)
   	at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager.updateMasterNodes(ServerNodeManager.java:309)
   	at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager.access$700(ServerNodeManager.java:67)
   	at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager$MasterDataListener.notify(ServerNodeManager.java:287)
   	at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.lambda$subscribe$1(ZookeeperRegistry.java:127)
   	at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:760)
   	at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:754)
   	at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100)
   	at org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
   	at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92)
   	at org.apache.curator.framework.recipes.cache.TreeCache.callListeners(TreeCache.java:753)
   	at org.apache.curator.framework.recipes.cache.TreeCache.access$1900(TreeCache.java:75)
   	at org.apache.curator.framework.recipes.cache.TreeCache$4.run(TreeCache.java:865)
   	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   [INFO] 2021-12-13 12:14:54.279 org.apache.zookeeper.ZooKeeper:[693] - Session: 0x20001523448001d closed
   [INFO] 2021-12-13 12:14:54.280 org.apache.zookeeper.ClientCnxn:[522] - EventThread shut down for session: 0x20001523448001d
   [INFO] 2021-12-13 12:14:54.280 org.quartz.core.QuartzScheduler:[666] - Scheduler DolphinScheduler_$_honghuo-master-1.honghuo-master-headless.honghuo.svc.cluster.local1639368841996 shutting down.
   [INFO] 2021-12-13 12:14:54.281 org.quartz.core.QuartzScheduler:[585] - Scheduler DolphinScheduler_$_honghuo-master-1.honghuo-master-headless.honghuo.svc.cluster.local1639368841996 paused.
   [INFO] 2021-12-13 12:14:54.283 com.zaxxer.hikari.HikariDataSource:[350] - DolphinScheduler - Shutdown initiated...
   [INFO] 2021-12-13 12:14:54.294 com.zaxxer.hikari.HikariDataSource:[352] - DolphinScheduler - Shutdown completed.
   [INFO] 2021-12-13 12:14:54.296 org.quartz.core.QuartzScheduler:[740] - Scheduler DolphinScheduler_$_honghuo-master-1.honghuo-master-headless.honghuo.svc.cluster.local1639368841996 shutdown complete.
   [INFO] 2021-12-13 12:14:54.296 org.apache.dolphinscheduler.service.quartz.QuartzExecutors:[210] - Quartz service stopped, and halt all tasks
   [INFO] 2021-12-13 12:14:54.297 org.apache.dolphinscheduler.server.master.MasterServer:[214] - Quartz service stopped
   [INFO] 2021-12-13 12:14:54.302 org.quartz.core.QuartzScheduler:[585] - Scheduler quartzScheduler_$_NON_CLUSTERED paused.
   [INFO] 2021-12-13 12:14:54.307 org.apache.dolphinscheduler.remote.NettyRemotingClient:[390] - netty client closed
   [INFO] 2021-12-13 12:14:54.307 org.apache.dolphinscheduler.service.log.LogClientService:[74] - logger client closed
   [INFO] 2021-12-13 12:14:54.307 org.springframework.scheduling.quartz.SchedulerFactoryBean:[845] - Shutting down Quartz Scheduler
   [INFO] 2021-12-13 12:14:54.308 org.quartz.core.QuartzScheduler:[666] - Scheduler quartzScheduler_$_NON_CLUSTERED shutting down.
   [INFO] 2021-12-13 12:14:54.308 org.quartz.core.QuartzScheduler:[585] - Scheduler quartzScheduler_$_NON_CLUSTERED paused.
   [INFO] 2021-12-13 12:14:54.309 org.quartz.core.QuartzScheduler:[740] - Scheduler quartzScheduler_$_NON_CLUSTERED shutdown complete.
   [INFO] 2021-12-13 12:14:54.314 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[139] - StateEventResponseWorker stopped
   [WARN] 2021-12-13 12:14:54.314 org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService:[115] - persist task error
   java.lang.InterruptedException: null
   	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
   	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
   	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
   	at org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService$StateEventResponseWorker.run(StateEventResponseService.java:112)
   [INFO] 2021-12-13 12:14:54.315 org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService:[120] - StateEventResponseWorker stopped
   
   ### What you expected to happen
   
   but  it should by stopped and process killed when  it judged to death
   
   ### How to reproduce
   
   kill master when it has processes to schedule
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   dev
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] zwZjut commented on issue #7361: [Bug] [dolphinscheduler-server] MasterServer not die as expected

Posted by GitBox <gi...@apache.org>.
zwZjut commented on issue #7361:
URL: https://github.com/apache/dolphinscheduler/issues/7361#issuecomment-992146388


   branch: 2.0.2-prepare


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] zwZjut closed issue #7361: [Bug] [dolphinscheduler-server] MasterServer not die as expected

Posted by GitBox <gi...@apache.org>.
zwZjut closed issue #7361:
URL: https://github.com/apache/dolphinscheduler/issues/7361


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #7361: [Bug] [dolphinscheduler-server] MasterServer not die as expected

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #7361:
URL: https://github.com/apache/dolphinscheduler/issues/7361#issuecomment-992146437


   Hi:
   * Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
   * In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
   * If you haven't received a reply for a long time, you can subscribe to the developer's email,Mail subscription steps reference https://dolphinscheduler.apache.org/en-us/community/development/subscribe.html ,Then write the issue URL in the email content and send question to dev@dolphinscheduler.apache.org.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org