You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2020/06/18 01:47:42 UTC

[GitHub] [incubator-dolphinscheduler] GabrielWithTina opened a new issue #3009: [BUG] Master server accidently shutdown every day

GabrielWithTina opened a new issue #3009:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3009


   
   
   **Describe the bug**
   When zookeeper connection timeout and reconnect to zookeeper, master always down. but zookeeper can be connected (you can see the successful connection in the log.
   
   From the log, it seems master is using the existing zookpeer session object to do the reconnection but those session objects has expired, this lead to master consider zk can not be connected and then shutdown itself. and after that, master open the new zookeeper session which is connected successfully but master still shutdown itself.
   
   
   
   **Expected behavior**
   I feel this is a bug due to reuse the expired zookeeper session.
   
   **Screenshots**
   Please see the log in detail
   
   [dolphinscheduler-master.log](https://github.com/apache/incubator-dolphinscheduler/files/4795817/dolphinscheduler-master.log)
    the bug is.
   
   **Which version of Dolphin Scheduler:**
    -[1.2.1]
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] GabrielWithTina commented on issue #3009: [BUG] Master server accidently shutdown every day

Posted by GitBox <gi...@apache.org>.
GabrielWithTina commented on issue #3009:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3009#issuecomment-645723254


   here is my zookeeper settting:
   #dolphinscheduler failover directory
   zookeeper.session.timeout=60000
   zookeeper.connection.timeout=60000
   zookeeper.retry.base.sleep=2000
   zookeeper.retry.max.sleep=120000
   zookeeper.retry.maxtime=29
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] Sam--Shan commented on issue #3009: [BUG] Master server accidently shutdown every day

Posted by GitBox <gi...@apache.org>.
Sam--Shan commented on issue #3009:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3009#issuecomment-726543491


   i have the same issue. about 2:15am everyday,the master and worker node shut down.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] GabrielWithTina commented on issue #3009: [BUG] Master server accidently shutdown every day

Posted by GitBox <gi...@apache.org>.
GabrielWithTina commented on issue #3009:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3009#issuecomment-645720350


   BTW. worker server has no any zookeeper session expired log and run with no any issues at the same time. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] GabrielWithTina commented on issue #3009: [BUG] Master server accidently shutdown every day

Posted by GitBox <gi...@apache.org>.
GabrielWithTina commented on issue #3009:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3009#issuecomment-645722959


   For each expired session, the log behavor:
   
   1. close socket and reconnect
   [WARN] 2020-06-18 02:20:34.110 org.apache.zookeeper.ClientCnxn:[1108] - Client session timed out, have not heard from server in 61353ms for sessionid 0x572ab3af9972208
   [INFO] 2020-06-18 02:20:34.318 org.apache.zookeeper.ClientCnxn:[1156] - Client session timed out, have not heard from server in 61353ms for sessionid 0x572ab3af9972208, closing socket connection and attempting reconnect
   
   2. unable to reconnect due to session expired
   [INFO] 2020-06-18 02:20:36.107 org.apache.zookeeper.ClientCnxn:[879] - Socket connection established to hadoop243/192.192.192.243:2181, initiating session
   [WARN] 2020-06-18 02:20:36.108 org.apache.zookeeper.ClientCnxn:[1285] - Unable to reconnect to ZooKeeper service, session 0x572ab3af9972208 has expired
   [INFO] 2020-06-18 02:20:36.108 org.apache.curator.framework.state.ConnectionStateManager:[228] - State change: LOST
   [WARN] 2020-06-18 02:20:36.108 org.apache.curator.ConnectionState:[336] - Session expired event received
   [INFO] 2020-06-18 02:20:36.108 org.apache.zookeeper.ClientCnxn:[1154] - Unable to reconnect to ZooKeeper service, session 0x572ab3af9972208 has expired, closing socket connection
   [INFO] 2020-06-18 02:20:36.200 org.apache.zookeeper.ClientCnxn:[522] - EventThread shut down for session: 0x572ab3af9972208
   
   
   3. open new sesion to reconnect
   [INFO] 2020-06-18 02:20:37.118 org.apache.zookeeper.ClientCnxn:[1025] - Opening socket connection to server hadoop243/192.192.192.243:2181. Will not attempt to authenticate using SASL (unknown error)
   [INFO] 2020-06-18 02:20:37.118 org.apache.zookeeper.ClientCnxn:[1025] - Opening socket connection to server hadoop242/192.192.192.242:2181. Will not attempt to authenticate using SASL (unknown error)
   [INFO] 2020-06-18 02:20:37.119 org.apache.zookeeper.ClientCnxn:[879] - Socket connection established to hadoop242/192.192.192.242:2181, initiating session
   [INFO] 2020-06-18 02:20:37.119 org.apache.zookeeper.ClientCnxn:[879] - Socket connection established to hadoop243/192.192.192.243:2181, initiating session
   [INFO] 2020-06-18 02:20:37.417 org.apache.zookeeper.ClientCnxn:[1299] - Session establishment complete on server hadoop243/192.192.192.243:2181, sessionid = 0x472c1a29a0602fd, negotiated timeout = 60000
   [INFO] 2020-06-18 02:20:37.417 org.apache.curator.framework.state.ConnectionStateManager:[228] - State change: RECONNECTED
   [INFO] 2020-06-18 02:20:38.882 org.apache.zookeeper.ClientCnxn:[1299] - Session establishment complete on server hadoop242/192.192.192.242:2181, sessionid = 0x372ab3af9952528, negotiated timeout = 60000
   
   4. master shutdown
   [INFO] 2020-06-18 02:20:38.882 org.apache.dolphinscheduler.server.master.MasterServer:[180] - master server is stopping ..., cause : i was judged to death, release resources and stop myself


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] xingchun-chen commented on issue #3009: [BUG] Master server accidently shutdown every day

Posted by GitBox <gi...@apache.org>.
xingchun-chen commented on issue #3009:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3009#issuecomment-645903098


   @lenboo 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] Sam--Shan edited a comment on issue #3009: [BUG] Master server accidently shutdown every day

Posted by GitBox <gi...@apache.org>.
Sam--Shan edited a comment on issue #3009:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3009#issuecomment-726543491


   i have the same issue. about 2:15am everyday,the master and worker node shut down.
   
   ![image](https://user-images.githubusercontent.com/10590637/99057428-bc75d500-25d6-11eb-9dcb-6038acc19b59.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org