You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Rohit N <ro...@tavant.com> on 2012/10/05 15:15:03 UTC

Master shuts down automatically after few seconds in a cluster

I have an unusual error occurring when I bring up my Mesos cluster using mesos-start-cluster.sh script. I have 2 machines running. I have Master and Slave - 1 running on Machine A and Slave - 2 running on Machine B. When I bring up the cluster, all 3 (Master and 2 slaves) starts successfully. However, master shuts down automatically after few seconds. But the slave will continue to run. The following is the message I get from the Info log in the Master:

I1006 02:55:27.068258 32699 hierarchical_allocator_process.hpp:345] Added slave 201210060255-16777343-5050-32697-0 (localhost) with cpus=4; mem=14517; disk=177248 (and cpus=4; mem=14517; disk=177248 available)
I1006 02:55:27.068335 32699 hierarchical_allocator_process.hpp:561] Performed allocation for slave 201210060255-16777343-5050-32697-0 in 739.00ns
I1006 02:55:27.068546 32700 hierarchical_allocator_process.hpp:371] Removed slave 201210060255-16777343-5050-32697-0
I1006 02:55:27.922291 32700 hierarchical_allocator_process.hpp:543] Performed allocation for 0 slaves in 1.83us
I1006 02:55:28.067174 32700 master.cpp:924] Attempting to register slave on localhost at slave(1)@127.0.0.1:44659
I1006 02:55:28.067221 32700 master.cpp:1160] Master now considering a slave at localhost:44659 as active
I1006 02:55:28.067246 32700 master.cpp:1740] Adding slave 201210060255-16777343-5050-32697-1 at localhost with cpus=4; mem=14517; disk=177248
I1006 02:55:28.067972 32700 master.cpp:520] Slave 201210060255-16777343-5050-32697-1(localhost) disconnected

I the cluster comes up without any problem when I just run Master and Slave -1 on Machine A removing slave - 2 on Machine B from the cluster setup i.e. mesos/var/mesos/deploy/slaves file. Any insight is greatly appreciated. Please help.

Thanks & Regards,

Rohit.N
Any comments or statements made in this email are not necessarily those of Tavant Technologies.
The information transmitted is intended only for the person or entity to which it is addressed and may
contain confidential and/or privileged material. If you have received this in error, please contact the
sender and delete the material from any computer. All e-mails sent from or to Tavant Technologies
may be subject to our monitoring procedures.

Re: Master shuts down automatically after few seconds in a cluster

Posted by Benjamin Mahler <bm...@twitter.com>.
Can you please reproduce this and provide the full logs for the master /
slaves?

There's currently not enough information here for me to be able to diagnose
the issue.

Thanks!
Ben

On Fri, Oct 5, 2012 at 6:15 AM, Rohit N <ro...@tavant.com> wrote:

> I have an unusual error occurring when I bring up my Mesos cluster using
> mesos-start-cluster.sh script. I have 2 machines running. I have Master and
> Slave - 1 running on Machine A and Slave - 2 running on Machine B. When I
> bring up the cluster, all 3 (Master and 2 slaves) starts successfully.
> However, master shuts down automatically after few seconds. But the slave
> will continue to run. The following is the message I get from the Info log
> in the Master:
>
> I1006 02:55:27.068258 32699 hierarchical_allocator_process.hpp:345] Added
> slave 201210060255-16777343-5050-32697-0 (localhost) with cpus=4;
> mem=14517; disk=177248 (and cpus=4; mem=14517; disk=177248 available)
> I1006 02:55:27.068335 32699 hierarchical_allocator_process.hpp:561]
> Performed allocation for slave 201210060255-16777343-5050-32697-0 in
> 739.00ns
> I1006 02:55:27.068546 32700 hierarchical_allocator_process.hpp:371]
> Removed slave 201210060255-16777343-5050-32697-0
> I1006 02:55:27.922291 32700 hierarchical_allocator_process.hpp:543]
> Performed allocation for 0 slaves in 1.83us
> I1006 02:55:28.067174 32700 master.cpp:924] Attempting to register slave
> on localhost at slave(1)@127.0.0.1:44659
> I1006 02:55:28.067221 32700 master.cpp:1160] Master now considering a
> slave at localhost:44659 as active
> I1006 02:55:28.067246 32700 master.cpp:1740] Adding slave
> 201210060255-16777343-5050-32697-1 at localhost with cpus=4; mem=14517;
> disk=177248
> I1006 02:55:28.067972 32700 master.cpp:520] Slave
> 201210060255-16777343-5050-32697-1(localhost) disconnected
>
> I the cluster comes up without any problem when I just run Master and
> Slave -1 on Machine A removing slave - 2 on Machine B from the cluster
> setup i.e. mesos/var/mesos/deploy/slaves file. Any insight is greatly
> appreciated. Please help.
>
> Thanks & Regards,
>
> Rohit.N
> Any comments or statements made in this email are not necessarily those of
> Tavant Technologies.
> The information transmitted is intended only for the person or entity to
> which it is addressed and may
> contain confidential and/or privileged material. If you have received this
> in error, please contact the
> sender and delete the material from any computer. All e-mails sent from or
> to Tavant Technologies
> may be subject to our monitoring procedures.
>