You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by xiaokun <xi...@gmail.com> on 2015/01/28 04:48:00 UTC

Slave cannot be registered while masters keep switching to another one.

I followed the instruction in page
http://mesosphere.com/docs/getting-started/datacenter/install/.
Setup two masters and one slave. And quorum value is "2". Configured ip
addresses in hostname files separately.
Here is the log from slave node,
I0127 22:37:26.762953  1966 slave.cpp:627] No credentials provided.
Attempting to register without authentication
I0127 22:37:26.762985  1966 slave.cpp:638] Detecting new master
I0127 22:37:26.763022  1966 status_update_manager.cpp:171] Pausing sending
status updates
I0127 22:38:06.683840  1962 slave.cpp:3321] Current usage 16.98%. Max
allowed age: 5.111732713224155days
I0127 22:38:26.986556  1966 slave.cpp:2623] master@10.27.17.135:5050 exited
W0127 22:38:26.986675  1966 slave.cpp:2626] Master disconnected! Waiting
for a new master to be elected
I0127 22:38:34.909605  1963 detector.cpp:138] Detected a new leader:
(id='2028')
I0127 22:38:34.909811  1963 group.cpp:659] Trying to get
'/mesos/info_0000002028' in ZooKeeper
I0127 22:38:34.910909  1963 detector.cpp:433] A new leading master (UPID=
master@10.27.16.214:5050) is detected
I0127 22:38:34.910989  1963 slave.cpp:602] New master detected at
master@10.27.16.214:5050
I0127 22:38:34.911113  1963 slave.cpp:627] No credentials provided.
Attempting to register without authentication
I0127 22:38:34.911144  1963 slave.cpp:638] Detecting new master
I0127 22:38:34.911183  1963 status_update_manager.cpp:171] Pausing sending
status updates
I0127 22:39:06.684526  1964 slave.cpp:3321] Current usage 16.98%. Max
allowed age: 5.111731773610567days
I0127 22:39:35.231653  1963 slave.cpp:2623] master@10.27.16.214:5050 exited
W0127 22:39:35.231869  1963 slave.cpp:2626] Master disconnected! Waiting
for a new master to be elected
I0127 22:39:42.761540  1964 detector.cpp:138] Detected a new leader:
(id='2029')
I0127 22:39:42.761732  1964 group.cpp:659] Trying to get
'/mesos/info_0000002029' in ZooKeeper
I0127 22:39:42.762914  1964 detector.cpp:433] A new leading master (UPID=
master@10.27.17.135:5050) is detected
I0127 22:39:42.762984  1964 slave.cpp:602] New master detected at
master@10.27.17.135:5050
I0127 22:39:42.763089  1964 slave.cpp:627] No credentials provided.
Attempting to register without authentication
I0127 22:39:42.763118  1964 slave.cpp:638] Detecting new master
I0127 22:39:42.763155  1964 status_update_manager.cpp:171] Pausing sending
status updates

Whenever the slave try to connect, the master will existed and try to elect
another one. Is there anything wrong with my configuration?

Thanks,
Xiaokun

Re: Slave cannot be registered while masters keep switching to another one.

Posted by xiaokun <xi...@gmail.com>.
Thanks for your explanation!

2015-01-28 18:02 GMT+08:00 Dick Davies <di...@hellooperator.net>:
>
> Be careful, there's now nothing stopping those 2 masters from forming
> 2 clusters.
> Add a third asap.
>
>
>
> On 28 January 2015 at 08:25, xiaokun <xi...@gmail.com> wrote:
> > hi, I changed the quorum to 1. Slave can be displayed now!
> >
> > Thanks!
> >
> > 2015-01-28 16:19 GMT+08:00 xiaokun <xi...@gmail.com>:
> >>
> >> Thanks for your reply. I will try to modify quorum to 1.
> >> Here is log from server side. Attachment is added.
> >> I0128 03:15:36.608562 15350 replica.cpp:638] Replica in VOTING status
> >> received a broadcasted recover request
> >> I0128 03:15:37.552141 15346 replica.cpp:638] Replica in VOTING status
> >> received a broadcasted recover request
> >> I0128 03:15:38.479542 15345 network.hpp:424] ZooKeeper group memberships
> >> changed
> >> I0128 03:15:38.479799 15345 group.cpp:659] Trying to get
> >> '/mesos/log_replicas/0000002270' in ZooKeeper
> >> I0128 03:15:38.480613 15345 group.cpp:659] Trying to get
> >> '/mesos/log_replicas/0000002271' in ZooKeeper
> >> I0128 03:15:38.481050 15345 group.cpp:659] Trying to get
> >> '/mesos/log_replicas/0000002272' in ZooKeeper
> >> I0128 03:15:38.481679 15345 network.hpp:466] ZooKeeper group PIDs: {
> >> log-replica(1)@10.27.17.135:5050, log-replica(1)@10.27.16.214:5050 }
> >> I0128 03:15:38.621351 15345 replica.cpp:638] Replica in VOTING status
> >> received a broadcasted recover request
> >> I0128 03:15:39.544558 15345 replica.cpp:638] Replica in VOTING status
> >> received a broadcasted recover request
> >> I0128 03:15:40.072347 15343 replica.cpp:638] Replica in VOTING status
> >> received a broadcasted recover request
> >> I0128 03:15:41.025926 15345 replica.cpp:638] Replica in VOTING status
> >> received a broadcasted recover request
> >> I0128 03:15:41.695303 15349 replica.cpp:638] Replica in VOTING status
> >> received a broadcasted recover request
> >> I0128 03:15:42.493906 15345 replica.cpp:638] Replica in VOTING status
> >> received a broadcasted recover request
> >> I0128 03:15:43.086762 15343 replica.cpp:638] Replica in VOTING status
> >> received a broadcasted recover request
> >> I0128 03:15:43.831442 15346 replica.cpp:638] Replica in VOTING status
> >> received a broadcasted recover request
> >> I0128 03:15:44.787384 15343 replica.cpp:638] Replica in VOTING status
> >> received a broadcasted recover request
> >> I0128 03:15:45.527914 15345 replica.cpp:638] Replica in VOTING status
> >> received a broadcasted recover request
> >> I0128 03:15:46.005728 15349 detector.cpp:138] Detected a new leader:
> >> (id='2272')
> >> I0128 03:15:46.005892 15349 group.cpp:659] Trying to get
> >> '/mesos/info_0000002272' in ZooKeeper
> >> I0128 03:15:46.006530 15349 detector.cpp:433] A new leading master
> >> (UPID=master@10.27.16.214:5050) is detected
> >> I0128 03:15:46.006624 15349 master.cpp:1263] The newly elected leader is
> >> master@10.27.16.214:5050 with id 20150128-031430-3591379722-5050-15326
> >> I0128 03:15:46.006664 15349 master.cpp:1276] Elected as the leading
> >> master!
> >
> >

Re: Slave cannot be registered while masters keep switching to another one.

Posted by Dick Davies <di...@hellooperator.net>.
Be careful, there's now nothing stopping those 2 masters from forming
2 clusters.
Add a third asap.



On 28 January 2015 at 08:25, xiaokun <xi...@gmail.com> wrote:
> hi, I changed the quorum to 1. Slave can be displayed now!
>
> Thanks!
>
> 2015-01-28 16:19 GMT+08:00 xiaokun <xi...@gmail.com>:
>>
>> Thanks for your reply. I will try to modify quorum to 1.
>> Here is log from server side. Attachment is added.
>> I0128 03:15:36.608562 15350 replica.cpp:638] Replica in VOTING status
>> received a broadcasted recover request
>> I0128 03:15:37.552141 15346 replica.cpp:638] Replica in VOTING status
>> received a broadcasted recover request
>> I0128 03:15:38.479542 15345 network.hpp:424] ZooKeeper group memberships
>> changed
>> I0128 03:15:38.479799 15345 group.cpp:659] Trying to get
>> '/mesos/log_replicas/0000002270' in ZooKeeper
>> I0128 03:15:38.480613 15345 group.cpp:659] Trying to get
>> '/mesos/log_replicas/0000002271' in ZooKeeper
>> I0128 03:15:38.481050 15345 group.cpp:659] Trying to get
>> '/mesos/log_replicas/0000002272' in ZooKeeper
>> I0128 03:15:38.481679 15345 network.hpp:466] ZooKeeper group PIDs: {
>> log-replica(1)@10.27.17.135:5050, log-replica(1)@10.27.16.214:5050 }
>> I0128 03:15:38.621351 15345 replica.cpp:638] Replica in VOTING status
>> received a broadcasted recover request
>> I0128 03:15:39.544558 15345 replica.cpp:638] Replica in VOTING status
>> received a broadcasted recover request
>> I0128 03:15:40.072347 15343 replica.cpp:638] Replica in VOTING status
>> received a broadcasted recover request
>> I0128 03:15:41.025926 15345 replica.cpp:638] Replica in VOTING status
>> received a broadcasted recover request
>> I0128 03:15:41.695303 15349 replica.cpp:638] Replica in VOTING status
>> received a broadcasted recover request
>> I0128 03:15:42.493906 15345 replica.cpp:638] Replica in VOTING status
>> received a broadcasted recover request
>> I0128 03:15:43.086762 15343 replica.cpp:638] Replica in VOTING status
>> received a broadcasted recover request
>> I0128 03:15:43.831442 15346 replica.cpp:638] Replica in VOTING status
>> received a broadcasted recover request
>> I0128 03:15:44.787384 15343 replica.cpp:638] Replica in VOTING status
>> received a broadcasted recover request
>> I0128 03:15:45.527914 15345 replica.cpp:638] Replica in VOTING status
>> received a broadcasted recover request
>> I0128 03:15:46.005728 15349 detector.cpp:138] Detected a new leader:
>> (id='2272')
>> I0128 03:15:46.005892 15349 group.cpp:659] Trying to get
>> '/mesos/info_0000002272' in ZooKeeper
>> I0128 03:15:46.006530 15349 detector.cpp:433] A new leading master
>> (UPID=master@10.27.16.214:5050) is detected
>> I0128 03:15:46.006624 15349 master.cpp:1263] The newly elected leader is
>> master@10.27.16.214:5050 with id 20150128-031430-3591379722-5050-15326
>> I0128 03:15:46.006664 15349 master.cpp:1276] Elected as the leading
>> master!
>
>

Re: Slave cannot be registered while masters keep switching to another one.

Posted by xiaokun <xi...@gmail.com>.
hi, I changed the quorum to 1. Slave can be displayed now!

Thanks!

2015-01-28 16:19 GMT+08:00 xiaokun <xi...@gmail.com>:

> Thanks for your reply. I will try to modify quorum to 1.
> Here is log from server side. Attachment is added.
> I0128 03:15:36.608562 15350 replica.cpp:638] Replica in VOTING status
> received a broadcasted recover request
> I0128 03:15:37.552141 15346 replica.cpp:638] Replica in VOTING status
> received a broadcasted recover request
> I0128 03:15:38.479542 15345 network.hpp:424] ZooKeeper group memberships
> changed
> I0128 03:15:38.479799 15345 group.cpp:659] Trying to get
> '/mesos/log_replicas/0000002270' in ZooKeeper
> I0128 03:15:38.480613 15345 group.cpp:659] Trying to get
> '/mesos/log_replicas/0000002271' in ZooKeeper
> I0128 03:15:38.481050 15345 group.cpp:659] Trying to get
> '/mesos/log_replicas/0000002272' in ZooKeeper
> I0128 03:15:38.481679 15345 network.hpp:466] ZooKeeper group PIDs: {
> log-replica(1)@10.27.17.135:5050, log-replica(1)@10.27.16.214:5050 }
> I0128 03:15:38.621351 15345 replica.cpp:638] Replica in VOTING status
> received a broadcasted recover request
> I0128 03:15:39.544558 15345 replica.cpp:638] Replica in VOTING status
> received a broadcasted recover request
> I0128 03:15:40.072347 15343 replica.cpp:638] Replica in VOTING status
> received a broadcasted recover request
> I0128 03:15:41.025926 15345 replica.cpp:638] Replica in VOTING status
> received a broadcasted recover request
> I0128 03:15:41.695303 15349 replica.cpp:638] Replica in VOTING status
> received a broadcasted recover request
> I0128 03:15:42.493906 15345 replica.cpp:638] Replica in VOTING status
> received a broadcasted recover request
> I0128 03:15:43.086762 15343 replica.cpp:638] Replica in VOTING status
> received a broadcasted recover request
> I0128 03:15:43.831442 15346 replica.cpp:638] Replica in VOTING status
> received a broadcasted recover request
> I0128 03:15:44.787384 15343 replica.cpp:638] Replica in VOTING status
> received a broadcasted recover request
> I0128 03:15:45.527914 15345 replica.cpp:638] Replica in VOTING status
> received a broadcasted recover request
> I0128 03:15:46.005728 15349 detector.cpp:138] Detected a new leader:
> (id='2272')
> I0128 03:15:46.005892 15349 group.cpp:659] Trying to get
> '/mesos/info_0000002272' in ZooKeeper
> I0128 03:15:46.006530 15349 detector.cpp:433] A new leading master (UPID=
> master@10.27.16.214:5050) is detected
> I0128 03:15:46.006624 15349 master.cpp:1263] The newly elected leader is
> master@10.27.16.214:5050 with id 20150128-031430-3591379722-5050-15326
> I0128 03:15:46.006664 15349 master.cpp:1276] Elected as the leading master!
>

Re: Slave cannot be registered while masters keep switching to another one.

Posted by xiaokun <xi...@gmail.com>.
Thanks for your reply. I will try to modify quorum to 1.
Here is log from server side. Attachment is added.
I0128 03:15:36.608562 15350 replica.cpp:638] Replica in VOTING status
received a broadcasted recover request
I0128 03:15:37.552141 15346 replica.cpp:638] Replica in VOTING status
received a broadcasted recover request
I0128 03:15:38.479542 15345 network.hpp:424] ZooKeeper group memberships
changed
I0128 03:15:38.479799 15345 group.cpp:659] Trying to get
'/mesos/log_replicas/0000002270' in ZooKeeper
I0128 03:15:38.480613 15345 group.cpp:659] Trying to get
'/mesos/log_replicas/0000002271' in ZooKeeper
I0128 03:15:38.481050 15345 group.cpp:659] Trying to get
'/mesos/log_replicas/0000002272' in ZooKeeper
I0128 03:15:38.481679 15345 network.hpp:466] ZooKeeper group PIDs: {
log-replica(1)@10.27.17.135:5050, log-replica(1)@10.27.16.214:5050 }
I0128 03:15:38.621351 15345 replica.cpp:638] Replica in VOTING status
received a broadcasted recover request
I0128 03:15:39.544558 15345 replica.cpp:638] Replica in VOTING status
received a broadcasted recover request
I0128 03:15:40.072347 15343 replica.cpp:638] Replica in VOTING status
received a broadcasted recover request
I0128 03:15:41.025926 15345 replica.cpp:638] Replica in VOTING status
received a broadcasted recover request
I0128 03:15:41.695303 15349 replica.cpp:638] Replica in VOTING status
received a broadcasted recover request
I0128 03:15:42.493906 15345 replica.cpp:638] Replica in VOTING status
received a broadcasted recover request
I0128 03:15:43.086762 15343 replica.cpp:638] Replica in VOTING status
received a broadcasted recover request
I0128 03:15:43.831442 15346 replica.cpp:638] Replica in VOTING status
received a broadcasted recover request
I0128 03:15:44.787384 15343 replica.cpp:638] Replica in VOTING status
received a broadcasted recover request
I0128 03:15:45.527914 15345 replica.cpp:638] Replica in VOTING status
received a broadcasted recover request
I0128 03:15:46.005728 15349 detector.cpp:138] Detected a new leader:
(id='2272')
I0128 03:15:46.005892 15349 group.cpp:659] Trying to get
'/mesos/info_0000002272' in ZooKeeper
I0128 03:15:46.006530 15349 detector.cpp:433] A new leading master (UPID=
master@10.27.16.214:5050) is detected
I0128 03:15:46.006624 15349 master.cpp:1263] The newly elected leader is
master@10.27.16.214:5050 with id 20150128-031430-3591379722-5050-15326
I0128 03:15:46.006664 15349 master.cpp:1276] Elected as the leading master!

Re: Slave cannot be registered while masters keep switching to another one.

Posted by Adam Bordelon <ad...@mesosphere.io>.
First thing I see is that you have a quorum of 2, but only 2 masters. For a
quorum of 2, you should have 3 masters. Logic: for an odd number of M
masters, a quorum/majority of ceiling(M/2) is necessary to win a vote. For
quorum Q, you should launch (2*Q)-1 masters.
What you are showing is expected behavior for a slave when the masters keep
switching. It would be more valuable to see the log from one of the masters
to determine why it shuts down.

On Tue, Jan 27, 2015 at 7:48 PM, xiaokun <xi...@gmail.com> wrote:

> I followed the instruction in page
> http://mesosphere.com/docs/getting-started/datacenter/install/.
> Setup two masters and one slave. And quorum value is "2". Configured ip
> addresses in hostname files separately.
> Here is the log from slave node,
> I0127 22:37:26.762953  1966 slave.cpp:627] No credentials provided.
> Attempting to register without authentication
> I0127 22:37:26.762985  1966 slave.cpp:638] Detecting new master
> I0127 22:37:26.763022  1966 status_update_manager.cpp:171] Pausing sending
> status updates
> I0127 22:38:06.683840  1962 slave.cpp:3321] Current usage 16.98%. Max
> allowed age: 5.111732713224155days
> I0127 22:38:26.986556  1966 slave.cpp:2623] master@10.27.17.135:5050
>  exited
> W0127 22:38:26.986675  1966 slave.cpp:2626] Master disconnected! Waiting
> for a new master to be elected
> I0127 22:38:34.909605  1963 detector.cpp:138] Detected a new leader:
> (id='2028')
> I0127 22:38:34.909811  1963 group.cpp:659] Trying to get
> '/mesos/info_0000002028' in ZooKeeper
> I0127 22:38:34.910909  1963 detector.cpp:433] A new leading master (UPID=
> master@10.27.16.214:5050) is detected
> I0127 22:38:34.910989  1963 slave.cpp:602] New master detected at
> master@10.27.16.214:5050
> I0127 22:38:34.911113  1963 slave.cpp:627] No credentials provided.
> Attempting to register without authentication
> I0127 22:38:34.911144  1963 slave.cpp:638] Detecting new master
> I0127 22:38:34.911183  1963 status_update_manager.cpp:171] Pausing sending
> status updates
> I0127 22:39:06.684526  1964 slave.cpp:3321] Current usage 16.98%. Max
> allowed age: 5.111731773610567days
> I0127 22:39:35.231653  1963 slave.cpp:2623] master@10.27.16.214:5050
>  exited
> W0127 22:39:35.231869  1963 slave.cpp:2626] Master disconnected! Waiting
> for a new master to be elected
> I0127 22:39:42.761540  1964 detector.cpp:138] Detected a new leader:
> (id='2029')
> I0127 22:39:42.761732  1964 group.cpp:659] Trying to get
> '/mesos/info_0000002029' in ZooKeeper
> I0127 22:39:42.762914  1964 detector.cpp:433] A new leading master (UPID=
> master@10.27.17.135:5050) is detected
> I0127 22:39:42.762984  1964 slave.cpp:602] New master detected at
> master@10.27.17.135:5050
> I0127 22:39:42.763089  1964 slave.cpp:627] No credentials provided.
> Attempting to register without authentication
> I0127 22:39:42.763118  1964 slave.cpp:638] Detecting new master
> I0127 22:39:42.763155  1964 status_update_manager.cpp:171] Pausing sending
> status updates
>
> Whenever the slave try to connect, the master will existed and try to
> elect another one. Is there anything wrong with my configuration?
>
> Thanks,
> Xiaokun
>