You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2013/05/08 23:45:17 UTC

[jira] [Created] (MESOS-465) Failure due to ZooKeeper authentication operation timeout in the master detector.

Benjamin Mahler created MESOS-465:
-------------------------------------

             Summary: Failure due to ZooKeeper authentication operation timeout in the master detector.
                 Key: MESOS-465
                 URL: https://issues.apache.org/jira/browse/MESOS-465
             Project: Mesos
          Issue Type: Bug
            Reporter: Benjamin Mahler
            Assignee: Benjamin Mahler


Observed this in production when ZK connectivity issues were present:

I0508 19:15:48.485126 18915 slave.cpp:951] Process exited: @0.0.0.0:0
W0508 19:15:48.485252 18915 slave.cpp:954] WARNING! Master disconnected! Waiting for a new master to be elected.
2013-05-08 19:15:51,621:18889(0x4ddeb940):ZOO_ERROR@handle_socket_error_msg@1528: Socket [10.40.93.122:2181] zk retcode=-7, errno=110(Connection timed out): connection timed out (exceeded timeout by 3ms)
2013-05-08 19:15:51,621:18889(0x4ddeb940):ZOO_INFO@check_events@1585: initiated connection to server [10.40.89.122:2181]
2013-05-08 19:15:51,623:18889(0x4ddeb940):ZOO_ERROR@handle_socket_error_msg@1621: Socket [10.40.89.122:2181] zk retcode=-112, errno=116(Stale NFS file handle): sessionId=0x33e8587f8b50141 has expired.
2013-05-08 19:15:51,623:18889(0x4ddeb940):ZOO_DEBUG@handle_error@1138: Calling a watcher for a ZOO_SESSION_EVENT and the state=ZOO_EXPIRED_SESSION_STATE
2013-05-08 19:15:51,623:18889(0x4ddeb940):ZOO_DEBUG@do_io@317: IO thread terminated
2013-05-08 19:15:51,623:18889(0x4d5ea940):ZOO_DEBUG@process_completions@1765: Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
W0508 19:15:51.623631 18904 detector.cpp:393] Master detector (slave(1)@IP:PORT)  ZooKeeper session expired!
2013-05-08 19:15:51,623:18889(0x4d5ea940):ZOO_DEBUG@do_completion@336: completion thread terminated
2013-05-08 19:15:51,623:18889(0x46ddd940):ZOO_INFO@zookeeper_close@2321: Freeing zookeeper resources for sessionId=0x33e8587f8b50141

2013-05-08 19:15:51,625:18889(0x46ddd940):ZOO_INFO@log_env@658: Client environment:zookeeper.version=zookeeper C client 3.3.4
2013-05-08 19:15:51,626:18889(0x46ddd940):ZOO_INFO@log_env@662: Client environment:host.name=HOSTNAME
2013-05-08 19:15:51,626:18889(0x46ddd940):ZOO_INFO@log_env@669: Client environment:os.name=Linux
2013-05-08 19:15:51,626:18889(0x46ddd940):ZOO_INFO@log_env@670: Client environment:os.arch=2.6.39-t7.el5
2013-05-08 19:15:51,626:18889(0x46ddd940):ZOO_INFO@log_env@671: Client environment:os.version=#1 SMP Tue Oct 23 23:59:44 UTC 2012
2013-05-08 19:15:51,626:18889(0x46ddd940):ZOO_INFO@log_env@679: Client environment:user.name=(null)
2013-05-08 19:15:51,626:18889(0x46ddd940):ZOO_INFO@log_env@687: Client environment:user.home=/root
2013-05-08 19:15:51,626:18889(0x46ddd940):ZOO_INFO@log_env@699: Client environment:user.dir=/
2013-05-08 19:15:51,626:18889(0x46ddd940):ZOO_INFO@zookeeper_init@727: Initiating client connection, host=ZKHOST:2181 sessionTimeout=10000 watcher=0x7f6da70adc00 sessionId=0 sessionPasswd
=<null> context=0x7f6d9c186640 flags=0
2013-05-08 19:15:51,629:18889(0x46ddd940):ZOO_DEBUG@start_threads@152: starting threads...
2013-05-08 19:15:51,629:18889(0x4d5ea940):ZOO_DEBUG@do_io@279: started IO thread
2013-05-08 19:15:51,629:18889(0x4ddeb940):ZOO_DEBUG@do_completion@326: started completion thread
2013-05-08 19:15:51,630:18889(0x4d5ea940):ZOO_INFO@check_events@1585: initiated connection to server [IP:2181]
2013-05-08 19:15:51,632:18889(0x4d5ea940):ZOO_INFO@check_events@1632: session establishment complete on server [ZK_IP:2181], sessionId=0x3e85920347000b, negotiated timeout=10000
2013-05-08 19:15:51,632:18889(0x4d5ea940):ZOO_DEBUG@check_events@1638: Calling a watcher for a ZOO_SESSION_EVENT and the state=ZOO_CONNECTED_STATE
2013-05-08 19:15:51,632:18889(0x4ddeb940):ZOO_DEBUG@process_completions@1765: Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
I0508 19:15:51.632997 18908 detector.cpp:283] Master detector (slave(1)@IP:5051) connected to ZooKeeper ...
I0508 19:15:51.633085 18908 detector.cpp:288] Authenticating to ZooKeeper using scheme 'digest'
2013-05-08 19:15:51,633:18889(0x48de1940):ZOO_DEBUG@send_last_auth_info@1265: Sending auth info request to ZK_IP:2181
I0508 19:15:53.453980 18913 http.cpp:280] HTTP request for '/slave(1)/stats.json'
2013-05-08 19:15:58,302:18889(0x4d5ea940):ZOO_ERROR@handle_socket_error_msg@1528: Socket [ZK_IP:2181] zk retcode=-7, errno=110(Connection timed out): connection timed out (exceeded timeout by 4ms)
2013-05-08 19:15:58,302:18889(0x4d5ea940):ZOO_DEBUG@handle_error@1141: Calling a watcher for a ZOO_SESSION_EVENT and the state=CONNECTING_STATE
2013-05-08 19:15:58,302:18889(0x4ddeb940):ZOO_DEBUG@process_completions@1765: Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
F0508 19:15:58.303388 18908 detector.cpp:291] Failed to authenticate with ZooKeeper: operation timeout
*** Check failure stack trace: ***
    @     0x7f6da732669d  google::LogMessage::Fail()
    @     0x7f6da732c307  google::LogMessage::SendToLog()
    @     0x7f6da7327f4c  google::LogMessage::Flush()
    @     0x7f6da73281b6  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f6da70762e0  mesos::internal::ZooKeeperMasterDetectorProcess::connected()
    @     0x7f6da7077298  std::tr1::_Function_handler<>::_M_invoke()
    @     0x7f6da7078e14  std::tr1::_Function_handler<>::_M_invoke()
    @     0x7f6da7221d7b  std::tr1::function<>::operator()()
    @     0x7f6da71daf9f  process::ProcessBase::visit()
    @     0x7f6da71efec8  process::DispatchEvent::visit()
    @     0x7f6da71e411d  process::ProcessManager::resume()
    @     0x7f6da71e4968  process::schedule()
    @     0x7f6da68fd73d  start_thread
    @     0x7f6da52e1f6d  clone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira