You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2022/08/25 02:03:23 UTC

[GitHub] [dolphinscheduler] stalary opened a new issue, #11635: [Bug] [Master-server] Master-server often disconnect with zookeeper due to server down

stalary opened a new issue, #11635:
URL: https://github.com/apache/dolphinscheduler/issues/11635

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
   Master-server often disconnect with zookeeper due to server down
   log
   ```
   [INFO] 2022-08-24 14:00:23.957 +0800 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[125] - [WorkflowInstance-0][TaskInstance-0] - MASTER node deleted : /nodes/master/192.168.120.17:5678
   [INFO] 2022-08-24 14:00:23.957 +0800 org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[302] - [WorkflowInstance-0][TaskInstance-0] - master node : /nodes/master/192.168.120.17:5678 down.
   [INFO] 2022-08-24 14:00:23.959 +0800 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[140] - [WorkflowInstance-0][TaskInstance-0] - path: /nodes/master/192.168.120.17:5678 not exists
   [INFO] 2022-08-24 14:00:23.961 +0800 org.apache.dolphinscheduler.service.registry.RegistryClient:[160] - [WorkflowInstance-0][TaskInstance-0] - MASTER server dead , and /nodes/master/192.168.120.17:5678 added to zk dead server path success
   [INFO] 2022-08-24 14:00:23.962 +0800 org.apache.dolphinscheduler.server.master.service.FailoverService:[53] - [WorkflowInstance-0][TaskInstance-0] - Master failover starting, masterServer: 192.168.120.17:5678
   [INFO] 2022-08-24 14:00:23.971 +0800 org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[364] - [WorkflowInstance-0][TaskInstance-0] - update master nodes, master size: 2, slot: 1, addr: 192.168.120.19:5678
   [INFO] 2022-08-24 14:00:23.980 +0800 org.apache.dolphinscheduler.server.master.service.FailoverService:[55] - [WorkflowInstance-0][TaskInstance-0] - Master failover finished, masterServer: 192.168.120.17:5678
   [INFO] 2022-08-24 14:00:29.280 +0800 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[125] - [WorkflowInstance-0][TaskInstance-0] - MASTER node deleted : /nodes/master/192.168.120.18:5678
   [INFO] 2022-08-24 14:00:29.280 +0800 org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[302] - [WorkflowInstance-0][TaskInstance-0] - master node : /nodes/master/192.168.120.18:5678 down.
   [INFO] 2022-08-24 14:00:29.281 +0800 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[140] - [WorkflowInstance-0][TaskInstance-0] - path: /nodes/master/192.168.120.18:5678 not exists
   [INFO] 2022-08-24 14:00:29.287 +0800 org.apache.dolphinscheduler.service.registry.RegistryClient:[160] - [WorkflowInstance-0][TaskInstance-0] - MASTER server dead , and /nodes/master/192.168.120.18:5678 added to zk dead server path success
   [INFO] 2022-08-24 14:00:29.287 +0800 org.apache.dolphinscheduler.server.master.service.FailoverService:[53] - [WorkflowInstance-0][TaskInstance-0] - Master failover starting, masterServer: 192.168.120.18:5678
   [INFO] 2022-08-24 14:00:29.291 +0800 org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[364] - [WorkflowInstance-0][TaskInstance-0] - update master nodes, master size: 1, slot: 0, addr: 192.168.120.19:5678
   [INFO] 2022-08-24 14:00:29.295 +0800 org.apache.dolphinscheduler.server.master.service.FailoverService:[55] - [WorkflowInstance-0][TaskInstance-0] - Master failover finished, masterServer: 192.168.120.18:5678
   [INFO] 2022-08-24 14:00:34.425 +0800 org.quartz.core.QuartzScheduler:[585] - [WorkflowInstance-0][TaskInstance-0] - Scheduler DolphinScheduler_$_development-91661307661519 paused.
   [INFO] 2022-08-24 14:00:34.433 +0800 org.eclipse.jetty.server.AbstractConnector:[381] - [WorkflowInstance-0][TaskInstance-0] - Stopped ServerConnector@16bd7ae1{HTTP/1.1, (http/1.1)}{0.0.0.0:5679}
   [INFO] 2022-08-24 14:00:34.433 +0800 org.eclipse.jetty.server.session:[149] - [WorkflowInstance-0][TaskInstance-0] - node0 Stopped scavenging
   [INFO] 2022-08-24 14:00:34.434 +0800 org.eclipse.jetty.server.handler.ContextHandler.application:[2347] - [WorkflowInstance-0][TaskInstance-0] - Destroying Spring FrameworkServlet 'dispatcherServlet'
   [INFO] 2022-08-24 14:00:34.435 +0800 org.eclipse.jetty.server.handler.ContextHandler:[1153] - [WorkflowInstance-0][TaskInstance-0] - Stopped o.s.b.w.e.j.JettyEmbeddedWebAppContext@7f2b584b{application,/,[file:///tmp/jetty-docbase.5679.2053901623813332765/],STOPPED}
   [INFO] 2022-08-24 14:00:34.438 +0800 org.apache.dolphinscheduler.server.master.rpc.MasterRPCServer:[109] - [WorkflowInstance-0][TaskInstance-0] - Closing Master RPC Server...
   [INFO] 2022-08-24 14:00:34.440 +0800 org.apache.dolphinscheduler.remote.NettyRemotingServer:[212] - [WorkflowInstance-0][TaskInstance-0] - netty server closed
   [INFO] 2022-08-24 14:00:34.441 +0800 org.apache.dolphinscheduler.server.master.rpc.MasterRPCServer:[111] - [WorkflowInstance-0][TaskInstance-0] - Closed Master RPC Server...
   [INFO] 2022-08-24 14:00:34.441 +0800 org.springframework.scheduling.quartz.SchedulerFactoryBean:[845] - [WorkflowInstance-0][TaskInstance-0] - Shutting down Quartz Scheduler
   [INFO] 2022-08-24 14:00:34.442 +0800 org.quartz.core.QuartzScheduler:[666] - [WorkflowInstance-0][TaskInstance-0] - Scheduler DolphinScheduler_$_development-91661307661519 shutting down.
   [INFO] 2022-08-24 14:00:34.442 +0800 org.quartz.core.QuartzScheduler:[585] - [WorkflowInstance-0][TaskInstance-0] - Scheduler DolphinScheduler_$_development-91661307661519 paused.
   [WARN] 2022-08-24 14:00:34.443 +0800 org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService:[125] - [WorkflowInstance-0][TaskInstance-0] - State event loop service interrupted, will stop this loop
   java.lang.InterruptedException: null
           at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
           at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
           at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
           at org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService$StateEventResponseWorker.run(StateEventResponseService.java:121)
   [INFO] 2022-08-24 14:00:34.443 +0800 org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService:[132] - [WorkflowInstance-0][TaskInstance-0] - State event loop service stopped
   [INFO] 2022-08-24 14:00:34.446 +0800 org.quartz.core.QuartzScheduler:[740] - [WorkflowInstance-0][TaskInstance-0] - Scheduler DolphinScheduler_$_development-91661307661519 shutdown complete.
   [INFO] 2022-08-24 14:00:34.446 +0800 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap:[117] - [WorkflowInstance-0][TaskInstance-0] - Master schedule bootstrap stopping...
   [INFO] 2022-08-24 14:00:34.446 +0800 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap:[118] - [WorkflowInstance-0][TaskInstance-0] - Master schedule bootstrap stopped...
   [INFO] 2022-08-24 14:00:34.450 +0800 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[226] - [WorkflowInstance-0][TaskInstance-0] - Master node : 192.168.120.19:5678 unRegistry to register center.
   [INFO] 2022-08-24 14:00:34.450 +0800 org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[302] - [WorkflowInstance-0][TaskInstance-0] - master node : /nodes/master/192.168.120.19:5678 down.
   [INFO] 2022-08-24 14:00:34.450 +0800 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[125] - [WorkflowInstance-0][TaskInstance-0] - MASTER node deleted : /nodes/master/192.168.120.19:5678
   [INFO] 2022-08-24 14:00:34.452 +0800 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[228] - [WorkflowInstance-0][TaskInstance-0] - MasterServer heartbeat executor shutdown
   [INFO] 2022-08-24 14:00:34.452 +0800 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[140] - [WorkflowInstance-0][TaskInstance-0] - path: /nodes/master/192.168.120.19:5678 not exists
   [INFO] 2022-08-24 14:00:34.453 +0800 org.apache.curator.framework.imps.CuratorFrameworkImpl:[955] - [WorkflowInstance-0][TaskInstance-0] - backgroundOperationsLoop exiting
   [ERROR] 2022-08-24 14:00:34.454 +0800 org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[324] - [WorkflowInstance-0][TaskInstance-0] - update master nodes error
   org.apache.dolphinscheduler.registry.api.RegistryException: zookeeper release lock error
           at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.acquireLock(ZookeeperRegistry.java:215)
           at org.apache.dolphinscheduler.service.registry.RegistryClient.getLock(RegistryClient.java:231)
           at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager.updateMasterNodes(ServerNodeManager.java:319)
           at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager.access$800(ServerNodeManager.java:68)
           at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager$MasterDataListener.notify(ServerNodeManager.java:303)
           at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.lambda$subscribe$1(ZookeeperRegistry.java:128)
           at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:760)
           at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:754)
           at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100)
           at org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
           at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92)
           at org.apache.curator.framework.recipes.cache.TreeCache.callListeners(TreeCache.java:753)
           at org.apache.curator.framework.recipes.cache.TreeCache.access$1900(TreeCache.java:75)
           at org.apache.curator.framework.recipes.cache.TreeCache$4.run(TreeCache.java:865)
           at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.IOException: Lost connection while trying to acquire lock: /lock/masters
           at org.apache.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:91)
           at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.acquireLock(ZookeeperRegistry.java:204)
           ... 18 common frames omitted
   [ERROR] 2022-08-24 14:00:34.454 +0800 org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[307] - [WorkflowInstance-0][TaskInstance-0] - MasterNodeListener capture data change and get data failed.
   java.lang.NullPointerException: null
           at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.releaseLock(ZookeeperRegistry.java:222)
           at org.apache.dolphinscheduler.service.registry.RegistryClient.releaseLock(RegistryClient.java:235)
           at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager.updateMasterNodes(ServerNodeManager.java:326)
           at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager.access$800(ServerNodeManager.java:68)
           at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager$MasterDataListener.notify(ServerNodeManager.java:303)
           at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.lambda$subscribe$1(ZookeeperRegistry.java:128)
           at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:760)
           at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:754)
           at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100)
           at org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
           at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92)
           at org.apache.curator.framework.recipes.cache.TreeCache.callListeners(TreeCache.java:753)
           at org.apache.curator.framework.recipes.cache.TreeCache.access$1900(TreeCache.java:75)
           at org.apache.curator.framework.recipes.cache.TreeCache$4.run(TreeCache.java:865)
           at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   [INFO] 2022-08-24 14:00:34.455 +0800 org.apache.dolphinscheduler.service.registry.RegistryClient:[160] - [WorkflowInstance-0][TaskInstance-0] - MASTER server dead , and /nodes/master/192.168.120.19:5678 added to zk dead server path success
   [INFO] 2022-08-24 14:00:34.455 +0800 org.apache.dolphinscheduler.server.master.service.FailoverService:[53] - [WorkflowInstance-0][TaskInstance-0] - Master failover starting, masterServer: 192.168.120.19:5678
   [ERROR] 2022-08-24 14:00:34.456 +0800 org.apache.dolphinscheduler.server.master.service.MasterFailoverService:[113] - [WorkflowInstance-0][TaskInstance-0] - Master server failover failed, host:192.168.120.19:5678
   org.apache.dolphinscheduler.registry.api.RegistryException: zookeeper release lock error
           at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.acquireLock(ZookeeperRegistry.java:215)
           at org.apache.dolphinscheduler.service.registry.RegistryClient.getLock(RegistryClient.java:231)
           at org.apache.dolphinscheduler.server.master.service.MasterFailoverService.failoverMaster(MasterFailoverService.java:110)
           at org.apache.dolphinscheduler.server.master.service.MasterFailoverService$$FastClassBySpringCGLIB$$479c980c.invoke(<generated>)
           at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
           at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:783)
           at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
           at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753)
           at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97)
           at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
           at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753)
           at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:698)
           at org.apache.dolphinscheduler.server.master.service.MasterFailoverService$$EnhancerBySpringCGLIB$$a2da3675.failoverMaster(<generated>)
           at org.apache.dolphinscheduler.server.master.service.FailoverService.failoverServerWhenDown(FailoverService.java:54)
           at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.removeMasterNodePath(MasterRegistryClient.java:147)
           at org.apache.dolphinscheduler.server.master.registry.MasterRegistryDataListener.handleMasterEvent(MasterRegistryDataListener.java:66)
           at org.apache.dolphinscheduler.server.master.registry.MasterRegistryDataListener.notify(MasterRegistryDataListener.java:52)
           at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.lambda$subscribe$1(ZookeeperRegistry.java:128)
           at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:760)
           at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:754)
           at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100)
           at org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
           at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92)
           at org.apache.curator.framework.recipes.cache.TreeCache.callListeners(TreeCache.java:753)
           at org.apache.curator.framework.recipes.cache.TreeCache.access$1900(TreeCache.java:75)
           at org.apache.curator.framework.recipes.cache.TreeCache$4.run(TreeCache.java:865)
           at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.IllegalStateException: Expected state [STARTED] was [STOPPED]
           at org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:823)
           at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkState(CuratorFrameworkImpl.java:432)
           at org.apache.curator.framework.imps.CuratorFrameworkImpl.create(CuratorFrameworkImpl.java:445)
           at org.apache.curator.framework.recipes.locks.StandardLockInternalsDriver.createsTheLock(StandardLockInternalsDriver.java:54)
           at org.apache.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:225)
           at org.apache.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:237)
           at org.apache.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:89)
           at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.acquireLock(ZookeeperRegistry.java:204)
           ... 30 common frames omitted
   [ERROR] 2022-08-24 14:00:34.456 +0800 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[150] - [WorkflowInstance-0][TaskInstance-0] - MASTER server failover failed, host:192.168.120.19:5678
   java.lang.NullPointerException: null
           at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.releaseLock(ZookeeperRegistry.java:222)
           at org.apache.dolphinscheduler.service.registry.RegistryClient.releaseLock(RegistryClient.java:235)
           at org.apache.dolphinscheduler.server.master.service.MasterFailoverService.failoverMaster(MasterFailoverService.java:115)
           at org.apache.dolphinscheduler.server.master.service.MasterFailoverService$$FastClassBySpringCGLIB$$479c980c.invoke(<generated>)
           at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
           at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:783)
           at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
           at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753)
           at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97)
           at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
           at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753)
           at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:698)
           at org.apache.dolphinscheduler.server.master.service.MasterFailoverService$$EnhancerBySpringCGLIB$$a2da3675.failoverMaster(<generated>)
           at org.apache.dolphinscheduler.server.master.service.FailoverService.failoverServerWhenDown(FailoverService.java:54)
           at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.removeMasterNodePath(MasterRegistryClient.java:147)
           at org.apache.dolphinscheduler.server.master.registry.MasterRegistryDataListener.handleMasterEvent(MasterRegistryDataListener.java:66)
           at org.apache.dolphinscheduler.server.master.registry.MasterRegistryDataListener.notify(MasterRegistryDataListener.java:52)
           at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.lambda$subscribe$1(ZookeeperRegistry.java:128)
           at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:760)
           at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:754)
           at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100)
           at org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
           at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92)
           at org.apache.curator.framework.recipes.cache.TreeCache.callListeners(TreeCache.java:753)
           at org.apache.curator.framework.recipes.cache.TreeCache.access$1900(TreeCache.java:75)
           at org.apache.curator.framework.recipes.cache.TreeCache$4.run(TreeCache.java:865)
           at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   [INFO] 2022-08-24 14:00:34.457 +0800 org.apache.zookeeper.ClientCnxn:[522] - [WorkflowInstance-0][TaskInstance-0] - EventThread shut down for session: 0x1029cabe5ed0006
   [INFO] 2022-08-24 14:00:34.457 +0800 org.apache.zookeeper.ZooKeeper:[693] - [WorkflowInstance-0][TaskInstance-0] - Session: 0x1029cabe5ed0006 closed
   [INFO] 2022-08-24 14:00:34.457 +0800 org.apache.dolphinscheduler.server.master.processor.queue.TaskEventService:[126] - [WorkflowInstance-0][TaskInstance-0] - StateEventResponseWorker stopped
   [WARN] 2022-08-24 14:00:34.458 +0800 org.apache.dolphinscheduler.server.master.processor.queue.TaskEventService:[148] - [WorkflowInstance-0][TaskInstance-0] - TaskEvent handle thread interrupted, will return this loop
   [INFO] 2022-08-24 14:00:34.462 +0800 com.zaxxer.hikari.HikariDataSource:[350] - [WorkflowInstance-0][TaskInstance-0] - DolphinScheduler - Shutdown initiated...
   [INFO] 2022-08-24 14:00:34.467 +0800 com.zaxxer.hikari.HikariDataSource:[352] - [WorkflowInstance-0][TaskInstance-0] - DolphinScheduler - Shutdown completed.
   ```
   
   ### What you expected to happen
   
   run normal.
   
   ### How to reproduce
   
   start master-server register with zookeeper, wait some time.
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   3.0.0
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] stalary closed issue #11635: [Bug] [Master-server] Master-server often disconnect with zookeeper due to server down

Posted by GitBox <gi...@apache.org>.
stalary closed issue #11635: [Bug] [Master-server] Master-server often disconnect with zookeeper due to server down
URL: https://github.com/apache/dolphinscheduler/issues/11635


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #11635: [Bug] [Master-server] Master-server often disconnect with zookeeper due to server down

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #11635:
URL: https://github.com/apache/dolphinscheduler/issues/11635#issuecomment-1226683364

   Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
   * In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
   * If you haven't received a reply for a long time, you can [join our slack](https://s.apache.org/dolphinscheduler-slack) and send your question to channel `#troubleshooting`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] QJG666 commented on issue #11635: [Bug] [Master-server] Master-server often disconnect with zookeeper due to server down

Posted by GitBox <gi...@apache.org>.
QJG666 commented on issue #11635:
URL: https://github.com/apache/dolphinscheduler/issues/11635#issuecomment-1295763775

   > I solved this problem by adjusting the zk memory size and the timeout configuration。
   
   Hello, I had the same problem with dolphin3.0.1. How do I adjust the memory and timeout parameters configured in zookeeper


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] stalary commented on issue #11635: [Bug] [Master-server] Master-server often disconnect with zookeeper due to server down

Posted by GitBox <gi...@apache.org>.
stalary commented on issue #11635:
URL: https://github.com/apache/dolphinscheduler/issues/11635#issuecomment-1236441040

   I solved this problem by adjusting the zk memory size and the timeout configuration。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] stalary commented on issue #11635: [Bug] [Master-server] Master-server often disconnect with zookeeper due to server down

Posted by GitBox <gi...@apache.org>.
stalary commented on issue #11635:
URL: https://github.com/apache/dolphinscheduler/issues/11635#issuecomment-1226852904

   Another phenomenon is that `worker-server` and `Kafka` using the same ZK have no problems, only `master-server` has problems


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org