You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "haosdent (JIRA)" <ji...@apache.org> on 2016/09/13 13:28:21 UTC

[jira] [Commented] (MESOS-6155) Mesos Master Crashing

    [ https://issues.apache.org/jira/browse/MESOS-6155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15487197#comment-15487197 ] 

haosdent commented on MESOS-6155:
---------------------------------

quora need to set to 2 if you have 3 masters.

> Mesos Master Crashing
> ---------------------
>
>                 Key: MESOS-6155
>                 URL: https://issues.apache.org/jira/browse/MESOS-6155
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.0.1
>            Reporter: Sankar Mittapally
>
> Hi,
>  We have three node cluster(quorum=3), If I start the cluster with these three nodes information, within 1minute it is getting crashed. Please suggest me how to fix this.
> I0913 13:04:08.780151 14220 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (188)@172.31.28.208:5050
> I0913 13:04:08.780427 14219 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:08.781888 14222 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:08.788697 14220 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:08.789645 14215 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (9)@172.31.14.48:5050
> I0913 13:04:09.345007 14215 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (192)@172.31.28.208:5050
> I0913 13:04:09.345319 14219 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:09.347131 14219 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:09.348122 14221 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:09.437337 14221 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (198)@172.31.22.57:5050
> I0913 13:04:09.531939 14221 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (13)@172.31.14.48:5050
> I0913 13:04:10.145277 14215 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (196)@172.31.28.208:5050
> I0913 13:04:10.145606 14215 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:10.146411 14216 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:10.146929 14216 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:10.369940 14222 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (202)@172.31.22.57:5050
> I0913 13:04:10.374109 14218 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (16)@172.31.14.48:5050
> I0913 13:04:10.712461 14219 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (199)@172.31.28.208:5050
> I0913 13:04:10.712688 14222 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:10.714576 14216 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:10.715409 14219 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:10.977103 14219 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (205)@172.31.22.57:5050
> I0913 13:04:11.158926 14222 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (19)@172.31.14.48:5050
> I0913 13:04:11.235548 14216 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (202)@172.31.28.208:5050
> I0913 13:04:11.235769 14215 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:11.237684 14220 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:11.238503 14216 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:11.697820 14216 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (208)@172.31.22.57:5050
> I0913 13:04:11.937495 14220 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (22)@172.31.14.48:5050
> I0913 13:04:11.986651 14220 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (205)@172.31.28.208:5050
> I0913 13:04:11.986855 14222 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:11.988693 14222 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:11.989394 14220 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:12.504758 14222 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (208)@172.31.28.208:5050
> I0913 13:04:12.505023 14218 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:12.507000 14222 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:12.507858 14222 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:12.666591 14215 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (211)@172.31.22.57:5050
> I0913 13:04:12.784739 14215 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (25)@172.31.14.48:5050
> I0913 13:04:13.215558 14215 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (211)@172.31.28.208:5050
> I0913 13:04:13.215870 14217 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:13.217674 14222 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:13.218454 14219 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I0913 13:04:13.495076 14218 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (214)@172.31.22.57:5050
> I0913 13:04:13.620205 14218 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (28)@172.31.14.48:5050
> F0913 13:04:13.694931 14219 master.cpp:1536] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
> *** Check failure stack trace: ***
>     @     0x7f7f210a8c56  google::LogMessage::Fail()
>     @     0x7f7f210a8bb5  google::LogMessage::SendToLog()
>     @     0x7f7f210a85c6  google::LogMessage::Flush()
>     @     0x7f7f210ab2fa  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f7f20060a28  mesos::internal::master::fail()
>     @     0x7f7f2012fd41  _ZNSt5_BindIFPFvRKSsS1_EPKcSt12_PlaceholderILi1EEEE6__callIvJS1_EJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
>     @     0x7f7f2010d533  _ZNSt5_BindIFPFvRKSsS1_EPKcSt12_PlaceholderILi1EEEEclIJS1_EvEET0_DpOT_
>     @     0x7f7f200dbdff  _ZZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvRKSsS6_EPKcSt12_PlaceholderILi1EEEEvEERKS2_OT_NS2_6PreferEENUlS6_E_clES6_
>     @     0x7f7f2012fe0e  _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvS1_S1_EPKcSt12_PlaceholderILi1EEEEvEERKS6_OT_NS6_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_
>     @           0x49f4db  std::function<>::operator()()
>     @           0x49a35f  _ZN7process8internal3runISt8functionIFvRKSsEEJS4_EEEvRKSt6vectorIT_SaIS8_EEDpOT0_
>     @           0x494ec6  process::Future<>::fail()
>     @     0x7f7f1fd2bbc4  process::Promise<>::fail()
>     @     0x7f7f2012d145  process::internal::thenf<>()
>     @     0x7f7f2016dacf  _ZNSt5_BindIFPFvRKSt8functionIFN7process6FutureI7NothingEERKN5mesos8internal8RegistryEEERKSt10shared_ptrINS1_7PromiseIS3_EEERKNS2_IS7_EEESB_SH_St12_PlaceholderILi1EEEE6__callIvISM_EILm0ELm1ELm2EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
>     @     0x7f7f20165689  std::_Bind<>::operator()<>()
>     @     0x7f7f2014e5d5  std::_Function_handler<>::_M_invoke()
>     @     0x7f7f201658b7  std::function<>::operator()()
>     @     0x7f7f2014e6fb  _ZZNK7process6FutureIN5mesos8internal8RegistryEE5onAnyIRSt8functionIFvRKS4_EEvEES8_OT_NS4_6PreferEENUlS8_E_clES8_
>     @     0x7f7f2016db76  _ZNSt17_Function_handlerIFvRKN7process6FutureIN5mesos8internal8RegistryEEEEZNKS5_5onAnyIRSt8functionIS8_EvEES7_OT_NS5_6PreferEEUlS7_E_E9_M_invokeERKSt9_Any_dataS7_
>     @     0x7f7f201658b7  std::function<>::operator()()
>     @     0x7f7f201cb846  process::internal::run<>()
>     @     0x7f7f201c1b86  process::Future<>::fail()
>     @     0x7f7f201fc30d  std::_Mem_fn<>::operator()<>()
>     @     0x7f7f201f773b  _ZNSt5_BindIFSt7_Mem_fnIMN7process6FutureIN5mesos8internal8RegistryEEEFbRKSsEES6_St12_PlaceholderILi1EEEE6__callIbIS8_EILm0ELm1EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
>     @     0x7f7f201efd6f  _ZNSt5_BindIFSt7_Mem_fnIMN7process6FutureIN5mesos8internal8RegistryEEEFbRKSsEES6_St12_PlaceholderILi1EEEEclIJS8_EbEET0_DpOT_
>     @     0x7f7f201e5d25  _ZZNK7process6FutureIN5mesos8internal8RegistryEE8onFailedISt5_BindIFSt7_Mem_fnIMS4_FbRKSsEES4_St12_PlaceholderILi1EEEEbEERKS4_OT_NS4_6PreferEENUlS9_E_clES9_
>     @     0x7f7f201f77c1  _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureIN5mesos8internal8RegistryEE8onFailedISt5_BindIFSt7_Mem_fnIMS8_FbS1_EES8_St12_PlaceholderILi1EEEEbEERKS8_OT_NS8_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_
>     @           0x49f4db  std::function<>::operator()()
>     @           0x49a35f  _ZN7process8internal3runISt8functionIFvRKSsEEJS4_EEEvRKSt6vectorIT_SaIS8_EEDpOT0_
>     @     0x7f7f201c1b5e  process::Future<>::fail()
>     @     0x7f7f201baa02  process::Promise<>::fail()
> mesos-master.sh --zk=zk://172.31.14.18:2181,172.31.28.208:2181,172.31.22.57:2181/mesos --work_dir=/var/lib/mesos --quorum=3
> This is the command I passed while starting cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)