You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "David Robinson (JIRA)" <ji...@apache.org> on 2013/10/01 02:37:23 UTC

[jira] [Created] (MESOS-712) invalid zhandle state

David Robinson created MESOS-712:
------------------------------------

             Summary: invalid zhandle state
                 Key: MESOS-712
                 URL: https://issues.apache.org/jira/browse/MESOS-712
             Project: Mesos
          Issue Type: Bug
    Affects Versions: 0.14.0
            Reporter: David Robinson


{noformat:title=log snippet}
2013-09-29 08:58:30,445:45279(0x7f9024e3f940):ZOO_WARN@zookeeper_interest@1461: Exceeded deadline by 16533ms
2013-09-29 08:58:30,445:45279(0x7f9024e3f940):ZOO_ERROR@handle_socket_error_msg@1528: Socket [192.168.0.1:2181] zk retcode=-7, errno=110(Connection timed out): connection timed out (exceeded timeout by 13199ms)
I0929 08:58:17.544836 45283 cgroups.cpp:1193] Trying to freeze cgroup /cgroup/mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738
2013-09-29 08:58:30,474:45279(0x7f9024e3f940):ZOO_DEBUG@handle_error@1141: Calling a watcher for a ZOO_SESSION_EVENT and the state=CONNECTING_STATE
2013-09-29 08:58:30,475:45279(0x7f9024e3f940):ZOO_WARN@zookeeper_interest@1461: Exceeded deadline by 16564ms
2013-09-29 08:58:30,475:45279(0x7f901ffff940):ZOO_DEBUG@process_completions@1765: Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
I0929 08:58:30.445508 45282 detector.cpp:251] Trying to create path '/home/mesos/prod/master' in ZooKeeper
2013-09-29 08:58:30,483:45279(0x7f9024e3f940):ZOO_INFO@check_events@1585: initiated connection to server [192.168.0.2:2181]
2013-09-29 08:58:30,488:45279(0x7f9031267940):ZOO_DEBUG@zoo_awexists@2587: Sending request xid=0x5244d598 for path [/home/mesos/prod/master] to 192.168.0.2:2181
2013-09-29 08:58:30,488:45279(0x7f9024e3f940):ZOO_ERROR@handle_socket_error_msg@1621: Socket [192.168.0.2:2181] zk retcode=-112, errno=116(Stale NFS file handle): sessionId=0x340523200364932 has expired.
2013-09-29 08:58:30,489:45279(0x7f9024e3f940):ZOO_DEBUG@handle_error@1138: Calling a watcher for a ZOO_SESSION_EVENT and the state=ZOO_EXPIRED_SESSION_STATE
2013-09-29 08:58:30,489:45279(0x7f9024e3f940):ZOO_DEBUG@do_io@317: IO thread terminated
2013-09-29 08:58:30,489:45279(0x7f901ffff940):ZOO_DEBUG@process_completions@1765: Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
2013-09-29 08:58:30,489:45279(0x7f901ffff940):ZOO_DEBUG@process_completions@1784: Calling COMPLETION_STAT for xid=0x5244d598 rc=-112
I0929 08:58:30.475751 45283 cgroups.cpp:1232] Successfully froze cgroup /cgroup/mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738 after 1 attempts
F0929 08:58:30.492090 45282 detector.cpp:266] Failed to create '/home/mesos/prod/master' in ZooKeeper: invalid zhandle state
*** Check failure stack trace: ***
I0929 08:58:30.492761 45292 cgroups.cpp:1208] Trying to thaw cgroup /cgroup/mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738
I0929 08:58:31.144810 45291 cgroups_isolator.cpp:937] Executor thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f of framework 201205082337-0000000003-0000 terminated with status 9
I0929 08:58:32.791193 45292 cgroups.cpp:1318] Successfully thawed /cgroup/mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738
I0929 08:58:33.675348 45298 cgroups_isolator.cpp:1275] Successfully destroyed cgroup mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738
I0929 08:58:33.676269 45300 slave.cpp:2158] Executor 'thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f' of framework 201205082337-0000000003-0000 has terminated with signal Killed
I0929 08:58:33.678154 45300 slave.cpp:1778] Handling status update TASK_FAILED (UUID: 4d90de5a-cdad-4bb8-ab93-7c4f185a0d24) for task 1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f of framework 201205082337-0000000003-0000 from @0.0.0.0:0
I0929 08:58:33.679175 45288 cgroups_isolator.cpp:700] Asked to update resources for an unknown/killed executor
I0929 08:58:33.679201 45300 status_update_manager.cpp:300] Received status update TASK_FAILED (UUID: 4d90de5a-cdad-4bb8-ab93-7c4f185a0d24) for task 1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f of framework 201205082337-0000000003-0000 
I0929 08:58:33.680452 45300 status_update_manager.hpp:337] Checkpointing UPDATE for status update TASK_FAILED (UUID: 4d90de5a-cdad-4bb8-ab93-7c4f185a0d24) for task 1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f of framework 201205082337-0000000003-0000 
    @     0x7f9035fb562d  google::LogMessage::Fail()
    @     0x7f9035fb9617  google::LogMessage::SendToLog()
    @     0x7f9035fb7f14  google::LogMessage::Flush()
I0929 08:58:35.929435 45300 status_update_manager.cpp:351] Forwarding status update TASK_FAILED (UUID: 4d90de5a-cdad-4bb8-ab93-7c4f185a0d24) for task 1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f of framework 201205082337-0000000003-0000 to master@10.42.69.138:5050
    @     0x7f9035fb8146  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f9035d1a83f  mesos::internal::ZooKeeperMasterDetectorProcess::connected()
    @     0x7f9035d1f118  std::tr1::_Function_handler<>::_M_invoke()
    @     0x7f9035d21b84  std::tr1::_Function_handler<>::_M_invoke()
    @     0x7f9035ea6f84  process::ProcessManager::resume()
    @     0x7f9035ea79df  process::schedule()
    @     0x7f903561083d  start_thread
    @     0x7f9033ff2f8d  clone
{noformat}

slave exited w/ SIGABRT. Zookeeper connection issue? Should Mesos handle this gracefully?



--
This message was sent by Atlassian JIRA
(v6.1#6144)