You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2020/06/29 11:11:16 UTC

[GitHub] [incubator-dolphinscheduler] antlers-lv opened a new issue #3076: heartbeat for zk failed

antlers-lv opened a new issue #3076:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3076


   [ERROR] 2020-06-29 18:16:59.995 org.apache.dolphinscheduler.common.zk.AbstractZKClient:[172] - heartbeat for zk failed : KeeperErrorCode = ConnectionLoss for /dolphinscheduler/workers/127.0.0.1_0000000008
   org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /dolphinscheduler/workers/127.0.0.1_0000000008
   	at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
   	at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
   	at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1336)
   	at org.apache.curator.framework.imps.SetDataBuilderImpl$4.call(SetDataBuilderImpl.java:282)
   	at org.apache.curator.framework.imps.SetDataBuilderImpl$4.call(SetDataBuilderImpl.java:278)
   	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
   	at org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:275)
   	at org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:261)
   	at org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:34)
   	at org.apache.dolphinscheduler.common.zk.AbstractZKClient.heartBeatForZk(AbstractZKClient.java:169)
   	at org.apache.dolphinscheduler.server.worker.WorkerServer$2.run(WorkerServer.java:293)
   	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
   	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
   	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   [INFO] 2020-06-29 18:17:00.009 org.apache.dolphinscheduler.server.worker.WorkerServer:[223] - worker server is stopping ..., cause : heartbeat for zk exception, release resources and stop myself
   [INFO] 2020-06-29 18:17:03.010 org.apache.dolphinscheduler.server.worker.WorkerServer:[240] - heartbeat service stopped
   [ERROR] 2020-06-29 18:17:07.612 org.apache.dolphinscheduler.common.queue.TaskQueueZkImpl:[369] - get all tasks from tasks queue exception
   org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /dolphinscheduler/tasks_kill
   	at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
   	at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
   	at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1659)
   	at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:230)
   	at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219)
   	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
   	at org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:216)
   	at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207)
   	at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40)
   	at org.apache.dolphinscheduler.common.queue.TaskQueueZkImpl.smembers(TaskQueueZkImpl.java:361)
   	at org.apache.dolphinscheduler.server.worker.WorkerServer$3.run(WorkerServer.java:324)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   [INFO] 2020-06-29 18:17:38.707 org.apache.dolphinscheduler.server.worker.WorkerServer:[248] - threadpool service stopped
   [INFO] 2020-06-29 18:17:38.707 org.apache.dolphinscheduler.server.worker.WorkerServer:[255] - worker kill executor service stopped
   [INFO] 2020-06-29 18:17:38.707 org.apache.dolphinscheduler.server.worker.WorkerServer:[262] - worker fetch task service stopped
   [ERROR] 2020-06-29 18:17:38.707 org.apache.dolphinscheduler.common.queue.TaskQueueZkImpl:[285] - delete task:2_6853_2_6766_-1 from zookeeper fail, exception:
   java.lang.InterruptedException: null
   	at java.lang.Object.wait(Native Method)
   	at java.lang.Object.wait(Object.java:502)
   	at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1407)
   	at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:880)
   	at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:250)
   	at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:244)
   	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
   	at org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:241)
   	at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:225)
   	at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:35)
   	at org.apache.dolphinscheduler.common.queue.TaskQueueZkImpl.removeNode(TaskQueueZkImpl.java:282)
   	at org.apache.dolphinscheduler.server.worker.runner.FetchTaskThread.removeNodeFromTaskQueue(FetchTaskThread.java:251)
   	at org.apache.dolphinscheduler.server.worker.runner.FetchTaskThread.run(FetchTaskThread.java:234)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   [INFO] 2020-06-29 18:17:38.726 org.apache.zookeeper.ZooKeeper:[693] - Session: 0x10058d42144008b closed
   [ERROR] 2020-06-29 18:17:38.731 org.apache.curator.framework.imps.CuratorFrameworkImpl:[566] - Background exception was not retry-able or retry gave up
   java.lang.IllegalStateException: Client is not started
   	at org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:176)
   	at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
   	at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835)
   	at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:507)
   	at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:221)
   	at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:35)
   	at org.apache.curator.framework.imps.FailedDeleteManager.addFailedDelete(FailedDeleteManager.java:55)
   	at org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:274)
   	at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:225)
   	at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:35)
   	at org.apache.curator.framework.recipes.locks.LockInternals.deleteOurPath(LockInternals.java:339)
   	at org.apache.curator.framework.recipes.locks.LockInternals.releaseLock(LockInternals.java:123)
   	at org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:154)
   	at org.apache.dolphinscheduler.common.zk.AbstractZKClient.releaseMutex(AbstractZKClient.java:497)
   	at org.apache.dolphinscheduler.server.worker.runner.FetchTaskThread.run(FetchTaskThread.java:240)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   [INFO] 2020-06-29 18:17:38.733 org.apache.dolphinscheduler.common.zk.AbstractZKClient:[138] - zookeeper close ...
   [INFO] 2020-06-29 18:17:38.733 org.apache.dolphinscheduler.server.worker.WorkerServer:[270] - zookeeper service stopped


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] dailidong edited a comment on issue #3076: heartbeat for zk failed

Posted by GitBox <gi...@apache.org>.
dailidong edited a comment on issue #3076:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3076#issuecomment-652287645


   first,please check if there is packet loss on the network, 
   if the network is fine, please show zookeeper.properties info under conf directory, the default value of two following items is too short
   `
   zookeeper.session.timeout=60000
   
   zookeeper.connection.timeout=30000
   `
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] gabrywu commented on issue #3076: heartbeat for zk failed

Posted by GitBox <gi...@apache.org>.
gabrywu commented on issue #3076:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3076#issuecomment-657422948


   @antlers-lv yes, @dailidong gave perfect suggestions, you can try the above parameters. I will close this ticket, feel free to reopen it necessarily 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] dailidong edited a comment on issue #3076: heartbeat for zk failed

Posted by GitBox <gi...@apache.org>.
dailidong edited a comment on issue #3076:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3076#issuecomment-652287645


   first,please check if there is packet loss on the network, 
   if the network is fine, please show zookeeper.properties info under conf directory, the default value of two following items is too short
   
   `zookeeper.session.timeout=60000
   
   zookeeper.connection.timeout=30000`
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] dailidong commented on issue #3076: heartbeat for zk failed

Posted by GitBox <gi...@apache.org>.
dailidong commented on issue #3076:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3076#issuecomment-652287645


   first,please check if there is packet loss on the network, 
   if the network is fine, please show zookeeper.properties info under conf directory, the default value of two following items is too short
   `
   zookeeper.session.timeout=60000
   zookeeper.connection.timeout=30000
   `
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] dailidong edited a comment on issue #3076: heartbeat for zk failed

Posted by GitBox <gi...@apache.org>.
dailidong edited a comment on issue #3076:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3076#issuecomment-652287645


   first,please check if there is packet loss on the network, 
   if the network is fine, please show zookeeper.properties info under conf directory, the default value of two following items is too short
   
   zookeeper.session.timeout=60000
   zookeeper.connection.timeout=30000
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] gabrywu closed issue #3076: heartbeat for zk failed

Posted by GitBox <gi...@apache.org>.
gabrywu closed issue #3076:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3076


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org