You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Eduardo Alfaia <ed...@gmail.com> on 2013/04/15 19:13:32 UTC

Slave Removedo

Hi Guys,
I am newer in Mesos and I am having some problems when running the launch
mesos scripts bellow. Why does the master remove the slave? I have seen
something about checkpoint.

MASTER
root@blockmon1:/opt/mesos-trunk/build/bin# ./mesos-master.sh
I0415 18:00:47.543422 17720 main.cpp:116] Build: 2013-04-14 23:48:51 by root
I0415 18:00:47.543926 17720 main.cpp:117] Starting Mesos master
I0415 18:00:47.545109 17720 master.cpp:309] Master started on 127.0.1.1:5050
I0415 18:00:47.545351 17720 master.cpp:324] Master ID:
201304151800-16842879-5050-17720
I0415 18:00:47.545819 17720 master.cpp:603] Elected as master!
W0415 18:00:47.546039 17737 master.cpp:81] No whitelist given. Advertising
offers for all slaves
W0415 18:00:52.547684 17736 master.cpp:81] No whitelist given. Advertising
offers for all slaves
W0415 18:00:57.550519 17736 master.cpp:81] No whitelist given. Advertising
offers for all slaves

se it is not checkpointing!
I0415 18:01:59.379076 17735 hierarchical_allocator_process.hpp:423] Removed
slave 201304151800-16842879-5050-17720-28
I0415 18:02:00.379822 17737 master.cpp:968] Attempting to register slave on
blockmon2 at slave(1)@127.0.1.1:36820
I0415 18:02:00.380177 17737 master.cpp:1224] Master now considering a slave
at blockmon2:36820 as active
I0415 18:02:00.380561 17737 master.cpp:1862] Adding slave
201304151800-16842879-5050-17720-29 at blockmon2 with cpus=1; mem=979;
ports=[31000-32000]; disk=2801
I0415 18:02:00.380813 17737 hierarchical_allocator_process.hpp:395] Added
slave 201304151800-16842879-5050-17720-29 (blockmon2) with cpus=1; mem=979;
ports=[31000-32000]; disk=2801 (and cpus=1; mem=979; ports=[31000-32000];
disk=2801 available)
I0415 18:02:00.381255 17734 master.cpp:537] Slave
201304151800-16842879-5050-17720-29(blockmon2) disconnected
I0415 18:02:00.381474 17734 master.cpp:542] Removing disconnected slave
201304151800-16842879-5050-17720-29(blockmon2) because it is not
checkpointing!
I0415 18:02:00.381882 17735 hierarchical_allocator_process.hpp:423] Removed
slave 201304151800-16842879-5050-17720-29

Thanks Guys

-- 
MSc Eduardo Costa Alfaia
PhD Student
Università degli Studi di Brescia

Re: Slave Removedo

Posted by Eduardo Alfaia <ed...@gmail.com>.
Really Benjamin, I'm using the truking, but in the latest version I've had
problems with gcc-4.7


2013/4/15 Benjamin Mahler <be...@gmail.com>

> Also, it looks like you're running off trunk, can you run the latest
> release instead?
> Trunk is not vetted in any production environment, so you'll be using it at
> your own risk.
>
>
> On Mon, Apr 15, 2013 at 10:22 AM, Vinod Kone <vi...@gmail.com> wrote:
>
> > Hi Eduardo,
> >
> > This looks like a networking issue. What is your cluster setup like?
> >
> > Are you running on Amazon EC2? We have seen similar behavior before when
> > users were running Mesos on EC2. If I remember correctly, the fix was to
> to
> > use private ip addresses for master and slaves, instead of "localhost" or
> > "public ip".
> >
> > @vinodkone
> >
> >
> > On Mon, Apr 15, 2013 at 10:13 AM, Eduardo Alfaia <
> eduardocalfaia@gmail.com
> > >
> >  wrote:
> >
> > > Hi Guys,
> > > I am newer in Mesos and I am having some problems when running the
> launch
> > > mesos scripts bellow. Why does the master remove the slave? I have seen
> > > something about checkpoint.
> > >
> > > MASTER
> > > root@blockmon1:/opt/mesos-trunk/build/bin# ./mesos-master.sh
> > > I0415 18:00:47.543422 17720 main.cpp:116] Build: 2013-04-14 23:48:51 by
> > > root
> > > I0415 18:00:47.543926 17720 main.cpp:117] Starting Mesos master
> > > I0415 18:00:47.545109 17720 master.cpp:309] Master started on
> > > 127.0.1.1:5050
> > > I0415 18:00:47.545351 17720 master.cpp:324] Master ID:
> > > 201304151800-16842879-5050-17720
> > > I0415 18:00:47.545819 17720 master.cpp:603] Elected as master!
> > > W0415 18:00:47.546039 17737 master.cpp:81] No whitelist given.
> > Advertising
> > > offers for all slaves
> > > W0415 18:00:52.547684 17736 master.cpp:81] No whitelist given.
> > Advertising
> > > offers for all slaves
> > > W0415 18:00:57.550519 17736 master.cpp:81] No whitelist given.
> > Advertising
> > > offers for all slaves
> > >
> > > se it is not checkpointing!
> > > I0415 18:01:59.379076 17735 hierarchical_allocator_process.hpp:423]
> > Removed
> > > slave 201304151800-16842879-5050-17720-28
> > > I0415 18:02:00.379822 17737 master.cpp:968] Attempting to register
> slave
> > on
> > > blockmon2 at slave(1)@127.0.1.1:36820
> > > I0415 18:02:00.380177 17737 master.cpp:1224] Master now considering a
> > slave
> > > at blockmon2:36820 as active
> > > I0415 18:02:00.380561 17737 master.cpp:1862] Adding slave
> > > 201304151800-16842879-5050-17720-29 at blockmon2 with cpus=1; mem=979;
> > > ports=[31000-32000]; disk=2801
> > > I0415 18:02:00.380813 17737 hierarchical_allocator_process.hpp:395]
> Added
> > > slave 201304151800-16842879-5050-17720-29 (blockmon2) with cpus=1;
> > mem=979;
> > > ports=[31000-32000]; disk=2801 (and cpus=1; mem=979;
> ports=[31000-32000];
> > > disk=2801 available)
> > > I0415 18:02:00.381255 17734 master.cpp:537] Slave
> > > 201304151800-16842879-5050-17720-29(blockmon2) disconnected
> > > I0415 18:02:00.381474 17734 master.cpp:542] Removing disconnected slave
> > > 201304151800-16842879-5050-17720-29(blockmon2) because it is not
> > > checkpointing!
> > > I0415 18:02:00.381882 17735 hierarchical_allocator_process.hpp:423]
> > Removed
> > > slave 201304151800-16842879-5050-17720-29
> > >
> > > Thanks Guys
> > >
> > > --
> > > MSc Eduardo Costa Alfaia
> > > PhD Student
> > > Università degli Studi di Brescia
> > >
> >
> >
> >
> > -- Vinod
> >
> >
> > On Mon, Apr 15, 2013 at 10:13 AM, Eduardo Alfaia
> > <ed...@gmail.com>wrote:
> >
> > > Hi Guys,
> > > I am newer in Mesos and I am having some problems when running the
> launch
> > > mesos scripts bellow. Why does the master remove the slave? I have seen
> > > something about checkpoint.
> > >
> > > MASTER
> > > root@blockmon1:/opt/mesos-trunk/build/bin# ./mesos-master.sh
> > > I0415 18:00:47.543422 17720 main.cpp:116] Build: 2013-04-14 23:48:51 by
> > > root
> > > I0415 18:00:47.543926 17720 main.cpp:117] Starting Mesos master
> > > I0415 18:00:47.545109 17720 master.cpp:309] Master started on
> > > 127.0.1.1:5050
> > > I0415 18:00:47.545351 17720 master.cpp:324] Master ID:
> > > 201304151800-16842879-5050-17720
> > > I0415 18:00:47.545819 17720 master.cpp:603] Elected as master!
> > > W0415 18:00:47.546039 17737 master.cpp:81] No whitelist given.
> > Advertising
> > > offers for all slaves
> > > W0415 18:00:52.547684 17736 master.cpp:81] No whitelist given.
> > Advertising
> > > offers for all slaves
> > > W0415 18:00:57.550519 17736 master.cpp:81] No whitelist given.
> > Advertising
> > > offers for all slaves
> > >
> > > se it is not checkpointing!
> > > I0415 18:01:59.379076 17735 hierarchical_allocator_process.hpp:423]
> > Removed
> > > slave 201304151800-16842879-5050-17720-28
> > > I0415 18:02:00.379822 17737 master.cpp:968] Attempting to register
> slave
> > on
> > > blockmon2 at slave(1)@127.0.1.1:36820
> > > I0415 18:02:00.380177 17737 master.cpp:1224] Master now considering a
> > slave
> > > at blockmon2:36820 as active
> > > I0415 18:02:00.380561 17737 master.cpp:1862] Adding slave
> > > 201304151800-16842879-5050-17720-29 at blockmon2 with cpus=1; mem=979;
> > > ports=[31000-32000]; disk=2801
> > > I0415 18:02:00.380813 17737 hierarchical_allocator_process.hpp:395]
> Added
> > > slave 201304151800-16842879-5050-17720-29 (blockmon2) with cpus=1;
> > mem=979;
> > > ports=[31000-32000]; disk=2801 (and cpus=1; mem=979;
> ports=[31000-32000];
> > > disk=2801 available)
> > > I0415 18:02:00.381255 17734 master.cpp:537] Slave
> > > 201304151800-16842879-5050-17720-29(blockmon2) disconnected
> > > I0415 18:02:00.381474 17734 master.cpp:542] Removing disconnected slave
> > > 201304151800-16842879-5050-17720-29(blockmon2) because it is not
> > > checkpointing!
> > > I0415 18:02:00.381882 17735 hierarchical_allocator_process.hpp:423]
> > Removed
> > > slave 201304151800-16842879-5050-17720-29
> > >
> > > Thanks Guys
> > >
> > > --
> > > MSc Eduardo Costa Alfaia
> > > PhD Student
> > > Università degli Studi di Brescia
> > >
> >
>



-- 
MSc Eduardo Costa Alfaia
PhD Student
Università degli Studi di Brescia

Re: Slave Removedo

Posted by Benjamin Mahler <be...@gmail.com>.
Also, it looks like you're running off trunk, can you run the latest
release instead?
Trunk is not vetted in any production environment, so you'll be using it at
your own risk.


On Mon, Apr 15, 2013 at 10:22 AM, Vinod Kone <vi...@gmail.com> wrote:

> Hi Eduardo,
>
> This looks like a networking issue. What is your cluster setup like?
>
> Are you running on Amazon EC2? We have seen similar behavior before when
> users were running Mesos on EC2. If I remember correctly, the fix was to to
> use private ip addresses for master and slaves, instead of "localhost" or
> "public ip".
>
> @vinodkone
>
>
> On Mon, Apr 15, 2013 at 10:13 AM, Eduardo Alfaia <eduardocalfaia@gmail.com
> >
>  wrote:
>
> > Hi Guys,
> > I am newer in Mesos and I am having some problems when running the launch
> > mesos scripts bellow. Why does the master remove the slave? I have seen
> > something about checkpoint.
> >
> > MASTER
> > root@blockmon1:/opt/mesos-trunk/build/bin# ./mesos-master.sh
> > I0415 18:00:47.543422 17720 main.cpp:116] Build: 2013-04-14 23:48:51 by
> > root
> > I0415 18:00:47.543926 17720 main.cpp:117] Starting Mesos master
> > I0415 18:00:47.545109 17720 master.cpp:309] Master started on
> > 127.0.1.1:5050
> > I0415 18:00:47.545351 17720 master.cpp:324] Master ID:
> > 201304151800-16842879-5050-17720
> > I0415 18:00:47.545819 17720 master.cpp:603] Elected as master!
> > W0415 18:00:47.546039 17737 master.cpp:81] No whitelist given.
> Advertising
> > offers for all slaves
> > W0415 18:00:52.547684 17736 master.cpp:81] No whitelist given.
> Advertising
> > offers for all slaves
> > W0415 18:00:57.550519 17736 master.cpp:81] No whitelist given.
> Advertising
> > offers for all slaves
> >
> > se it is not checkpointing!
> > I0415 18:01:59.379076 17735 hierarchical_allocator_process.hpp:423]
> Removed
> > slave 201304151800-16842879-5050-17720-28
> > I0415 18:02:00.379822 17737 master.cpp:968] Attempting to register slave
> on
> > blockmon2 at slave(1)@127.0.1.1:36820
> > I0415 18:02:00.380177 17737 master.cpp:1224] Master now considering a
> slave
> > at blockmon2:36820 as active
> > I0415 18:02:00.380561 17737 master.cpp:1862] Adding slave
> > 201304151800-16842879-5050-17720-29 at blockmon2 with cpus=1; mem=979;
> > ports=[31000-32000]; disk=2801
> > I0415 18:02:00.380813 17737 hierarchical_allocator_process.hpp:395] Added
> > slave 201304151800-16842879-5050-17720-29 (blockmon2) with cpus=1;
> mem=979;
> > ports=[31000-32000]; disk=2801 (and cpus=1; mem=979; ports=[31000-32000];
> > disk=2801 available)
> > I0415 18:02:00.381255 17734 master.cpp:537] Slave
> > 201304151800-16842879-5050-17720-29(blockmon2) disconnected
> > I0415 18:02:00.381474 17734 master.cpp:542] Removing disconnected slave
> > 201304151800-16842879-5050-17720-29(blockmon2) because it is not
> > checkpointing!
> > I0415 18:02:00.381882 17735 hierarchical_allocator_process.hpp:423]
> Removed
> > slave 201304151800-16842879-5050-17720-29
> >
> > Thanks Guys
> >
> > --
> > MSc Eduardo Costa Alfaia
> > PhD Student
> > Università degli Studi di Brescia
> >
>
>
>
> -- Vinod
>
>
> On Mon, Apr 15, 2013 at 10:13 AM, Eduardo Alfaia
> <ed...@gmail.com>wrote:
>
> > Hi Guys,
> > I am newer in Mesos and I am having some problems when running the launch
> > mesos scripts bellow. Why does the master remove the slave? I have seen
> > something about checkpoint.
> >
> > MASTER
> > root@blockmon1:/opt/mesos-trunk/build/bin# ./mesos-master.sh
> > I0415 18:00:47.543422 17720 main.cpp:116] Build: 2013-04-14 23:48:51 by
> > root
> > I0415 18:00:47.543926 17720 main.cpp:117] Starting Mesos master
> > I0415 18:00:47.545109 17720 master.cpp:309] Master started on
> > 127.0.1.1:5050
> > I0415 18:00:47.545351 17720 master.cpp:324] Master ID:
> > 201304151800-16842879-5050-17720
> > I0415 18:00:47.545819 17720 master.cpp:603] Elected as master!
> > W0415 18:00:47.546039 17737 master.cpp:81] No whitelist given.
> Advertising
> > offers for all slaves
> > W0415 18:00:52.547684 17736 master.cpp:81] No whitelist given.
> Advertising
> > offers for all slaves
> > W0415 18:00:57.550519 17736 master.cpp:81] No whitelist given.
> Advertising
> > offers for all slaves
> >
> > se it is not checkpointing!
> > I0415 18:01:59.379076 17735 hierarchical_allocator_process.hpp:423]
> Removed
> > slave 201304151800-16842879-5050-17720-28
> > I0415 18:02:00.379822 17737 master.cpp:968] Attempting to register slave
> on
> > blockmon2 at slave(1)@127.0.1.1:36820
> > I0415 18:02:00.380177 17737 master.cpp:1224] Master now considering a
> slave
> > at blockmon2:36820 as active
> > I0415 18:02:00.380561 17737 master.cpp:1862] Adding slave
> > 201304151800-16842879-5050-17720-29 at blockmon2 with cpus=1; mem=979;
> > ports=[31000-32000]; disk=2801
> > I0415 18:02:00.380813 17737 hierarchical_allocator_process.hpp:395] Added
> > slave 201304151800-16842879-5050-17720-29 (blockmon2) with cpus=1;
> mem=979;
> > ports=[31000-32000]; disk=2801 (and cpus=1; mem=979; ports=[31000-32000];
> > disk=2801 available)
> > I0415 18:02:00.381255 17734 master.cpp:537] Slave
> > 201304151800-16842879-5050-17720-29(blockmon2) disconnected
> > I0415 18:02:00.381474 17734 master.cpp:542] Removing disconnected slave
> > 201304151800-16842879-5050-17720-29(blockmon2) because it is not
> > checkpointing!
> > I0415 18:02:00.381882 17735 hierarchical_allocator_process.hpp:423]
> Removed
> > slave 201304151800-16842879-5050-17720-29
> >
> > Thanks Guys
> >
> > --
> > MSc Eduardo Costa Alfaia
> > PhD Student
> > Università degli Studi di Brescia
> >
>

Re: Slave Removedo

Posted by Benjamin Mahler <be...@gmail.com>.
Filed: https://issues.apache.org/jira/browse/MESOS-435


On Mon, Apr 15, 2013 at 11:27 AM, Benjamin Mahler <benjamin.mahler@gmail.com
> wrote:

> Seems like we need to fix this as it's a common hurdle.
>
> Has anyone looked at the root of this problem?
>
>
> On Mon, Apr 15, 2013 at 11:25 AM, Eduardo Alfaia <eduardocalfaia@gmail.com
> > wrote:
>
>> Hi Guys,
>>
>> the Private IP instead the FQDN is working, however I had had to change
>> the
>> /etc/hosts
>>
>> thanks
>>
>>
>> 2013/4/15 Benjamin Mahler <be...@gmail.com>
>>
>> > Can you try using the private IP instead? You can find it using
>> ifconfig.
>> >
>> >
>> > On Mon, Apr 15, 2013 at 10:33 AM, Eduardo Alfaia
>> > <ed...@gmail.com>wrote:
>> >
>> > > Hi Vinod, thanks by your fast replay
>> > >
>> > > I'm not using EC2 but I'm using the name of server like, for example
>> > > blockmon1.ing.unibs.it. Could be this?
>> > >
>> > > I'm using 3 nodes ( 1 Master and 2 Slaves)
>> > >
>> > > Regards
>> > >
>> > >
>> > > 2013/4/15 Vinod Kone <vi...@gmail.com>
>> > >
>> > > > Hi Eduardo,
>> > > >
>> > > > This looks like a networking issue. What is your cluster setup like?
>> > > >
>> > > > Are you running on Amazon EC2? We have seen similar behavior before
>> > when
>> > > > users were running Mesos on EC2. If I remember correctly, the fix
>> was
>> > to
>> > > to
>> > > > use private ip addresses for master and slaves, instead of
>> "localhost"
>> > or
>> > > > "public ip".
>> > > >
>> > > > @vinodkone
>> > > >
>> > > >
>> > > > On Mon, Apr 15, 2013 at 10:13 AM, Eduardo Alfaia <
>> > > eduardocalfaia@gmail.com
>> > > > >
>> > > >  wrote:
>> > > >
>> > > > > Hi Guys,
>> > > > > I am newer in Mesos and I am having some problems when running the
>> > > launch
>> > > > > mesos scripts bellow. Why does the master remove the slave? I have
>> > seen
>> > > > > something about checkpoint.
>> > > > >
>> > > > > MASTER
>> > > > > root@blockmon1:/opt/mesos-trunk/build/bin# ./mesos-master.sh
>> > > > > I0415 18:00:47.543422 17720 main.cpp:116] Build: 2013-04-14
>> 23:48:51
>> > by
>> > > > > root
>> > > > > I0415 18:00:47.543926 17720 main.cpp:117] Starting Mesos master
>> > > > > I0415 18:00:47.545109 17720 master.cpp:309] Master started on
>> > > > > 127.0.1.1:5050
>> > > > > I0415 18:00:47.545351 17720 master.cpp:324] Master ID:
>> > > > > 201304151800-16842879-5050-17720
>> > > > > I0415 18:00:47.545819 17720 master.cpp:603] Elected as master!
>> > > > > W0415 18:00:47.546039 17737 master.cpp:81] No whitelist given.
>> > > > Advertising
>> > > > > offers for all slaves
>> > > > > W0415 18:00:52.547684 17736 master.cpp:81] No whitelist given.
>> > > > Advertising
>> > > > > offers for all slaves
>> > > > > W0415 18:00:57.550519 17736 master.cpp:81] No whitelist given.
>> > > > Advertising
>> > > > > offers for all slaves
>> > > > >
>> > > > > se it is not checkpointing!
>> > > > > I0415 18:01:59.379076 17735
>> hierarchical_allocator_process.hpp:423]
>> > > > Removed
>> > > > > slave 201304151800-16842879-5050-17720-28
>> > > > > I0415 18:02:00.379822 17737 master.cpp:968] Attempting to register
>> > > slave
>> > > > on
>> > > > > blockmon2 at slave(1)@127.0.1.1:36820
>> > > > > I0415 18:02:00.380177 17737 master.cpp:1224] Master now
>> considering a
>> > > > slave
>> > > > > at blockmon2:36820 as active
>> > > > > I0415 18:02:00.380561 17737 master.cpp:1862] Adding slave
>> > > > > 201304151800-16842879-5050-17720-29 at blockmon2 with cpus=1;
>> > mem=979;
>> > > > > ports=[31000-32000]; disk=2801
>> > > > > I0415 18:02:00.380813 17737
>> hierarchical_allocator_process.hpp:395]
>> > > Added
>> > > > > slave 201304151800-16842879-5050-17720-29 (blockmon2) with cpus=1;
>> > > > mem=979;
>> > > > > ports=[31000-32000]; disk=2801 (and cpus=1; mem=979;
>> > > ports=[31000-32000];
>> > > > > disk=2801 available)
>> > > > > I0415 18:02:00.381255 17734 master.cpp:537] Slave
>> > > > > 201304151800-16842879-5050-17720-29(blockmon2) disconnected
>> > > > > I0415 18:02:00.381474 17734 master.cpp:542] Removing disconnected
>> > slave
>> > > > > 201304151800-16842879-5050-17720-29(blockmon2) because it is not
>> > > > > checkpointing!
>> > > > > I0415 18:02:00.381882 17735
>> hierarchical_allocator_process.hpp:423]
>> > > > Removed
>> > > > > slave 201304151800-16842879-5050-17720-29
>> > > > >
>> > > > > Thanks Guys
>> > > > >
>> > > > > --
>> > > > > MSc Eduardo Costa Alfaia
>> > > > > PhD Student
>> > > > > Università degli Studi di Brescia
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > -- Vinod
>> > > >
>> > > >
>> > > > On Mon, Apr 15, 2013 at 10:13 AM, Eduardo Alfaia
>> > > > <ed...@gmail.com>wrote:
>> > > >
>> > > > > Hi Guys,
>> > > > > I am newer in Mesos and I am having some problems when running the
>> > > launch
>> > > > > mesos scripts bellow. Why does the master remove the slave? I have
>> > seen
>> > > > > something about checkpoint.
>> > > > >
>> > > > > MASTER
>> > > > > root@blockmon1:/opt/mesos-trunk/build/bin# ./mesos-master.sh
>> > > > > I0415 18:00:47.543422 17720 main.cpp:116] Build: 2013-04-14
>> 23:48:51
>> > by
>> > > > > root
>> > > > > I0415 18:00:47.543926 17720 main.cpp:117] Starting Mesos master
>> > > > > I0415 18:00:47.545109 17720 master.cpp:309] Master started on
>> > > > > 127.0.1.1:5050
>> > > > > I0415 18:00:47.545351 17720 master.cpp:324] Master ID:
>> > > > > 201304151800-16842879-5050-17720
>> > > > > I0415 18:00:47.545819 17720 master.cpp:603] Elected as master!
>> > > > > W0415 18:00:47.546039 17737 master.cpp:81] No whitelist given.
>> > > > Advertising
>> > > > > offers for all slaves
>> > > > > W0415 18:00:52.547684 17736 master.cpp:81] No whitelist given.
>> > > > Advertising
>> > > > > offers for all slaves
>> > > > > W0415 18:00:57.550519 17736 master.cpp:81] No whitelist given.
>> > > > Advertising
>> > > > > offers for all slaves
>> > > > >
>> > > > > se it is not checkpointing!
>> > > > > I0415 18:01:59.379076 17735
>> hierarchical_allocator_process.hpp:423]
>> > > > Removed
>> > > > > slave 201304151800-16842879-5050-17720-28
>> > > > > I0415 18:02:00.379822 17737 master.cpp:968] Attempting to register
>> > > slave
>> > > > on
>> > > > > blockmon2 at slave(1)@127.0.1.1:36820
>> > > > > I0415 18:02:00.380177 17737 master.cpp:1224] Master now
>> considering a
>> > > > slave
>> > > > > at blockmon2:36820 as active
>> > > > > I0415 18:02:00.380561 17737 master.cpp:1862] Adding slave
>> > > > > 201304151800-16842879-5050-17720-29 at blockmon2 with cpus=1;
>> > mem=979;
>> > > > > ports=[31000-32000]; disk=2801
>> > > > > I0415 18:02:00.380813 17737
>> hierarchical_allocator_process.hpp:395]
>> > > Added
>> > > > > slave 201304151800-16842879-5050-17720-29 (blockmon2) with cpus=1;
>> > > > mem=979;
>> > > > > ports=[31000-32000]; disk=2801 (and cpus=1; mem=979;
>> > > ports=[31000-32000];
>> > > > > disk=2801 available)
>> > > > > I0415 18:02:00.381255 17734 master.cpp:537] Slave
>> > > > > 201304151800-16842879-5050-17720-29(blockmon2) disconnected
>> > > > > I0415 18:02:00.381474 17734 master.cpp:542] Removing disconnected
>> > slave
>> > > > > 201304151800-16842879-5050-17720-29(blockmon2) because it is not
>> > > > > checkpointing!
>> > > > > I0415 18:02:00.381882 17735
>> hierarchical_allocator_process.hpp:423]
>> > > > Removed
>> > > > > slave 201304151800-16842879-5050-17720-29
>> > > > >
>> > > > > Thanks Guys
>> > > > >
>> > > > > --
>> > > > > MSc Eduardo Costa Alfaia
>> > > > > PhD Student
>> > > > > Università degli Studi di Brescia
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > MSc Eduardo Costa Alfaia
>> > > PhD Student
>> > > Università degli Studi di Brescia
>> > >
>> >
>>
>>
>>
>> --
>> MSc Eduardo Costa Alfaia
>> PhD Student
>> Università degli Studi di Brescia
>>
>
>

Re: Slave Removedo

Posted by Benjamin Mahler <be...@gmail.com>.
Seems like we need to fix this as it's a common hurdle.

Has anyone looked at the root of this problem?


On Mon, Apr 15, 2013 at 11:25 AM, Eduardo Alfaia
<ed...@gmail.com>wrote:

> Hi Guys,
>
> the Private IP instead the FQDN is working, however I had had to change the
> /etc/hosts
>
> thanks
>
>
> 2013/4/15 Benjamin Mahler <be...@gmail.com>
>
> > Can you try using the private IP instead? You can find it using ifconfig.
> >
> >
> > On Mon, Apr 15, 2013 at 10:33 AM, Eduardo Alfaia
> > <ed...@gmail.com>wrote:
> >
> > > Hi Vinod, thanks by your fast replay
> > >
> > > I'm not using EC2 but I'm using the name of server like, for example
> > > blockmon1.ing.unibs.it. Could be this?
> > >
> > > I'm using 3 nodes ( 1 Master and 2 Slaves)
> > >
> > > Regards
> > >
> > >
> > > 2013/4/15 Vinod Kone <vi...@gmail.com>
> > >
> > > > Hi Eduardo,
> > > >
> > > > This looks like a networking issue. What is your cluster setup like?
> > > >
> > > > Are you running on Amazon EC2? We have seen similar behavior before
> > when
> > > > users were running Mesos on EC2. If I remember correctly, the fix was
> > to
> > > to
> > > > use private ip addresses for master and slaves, instead of
> "localhost"
> > or
> > > > "public ip".
> > > >
> > > > @vinodkone
> > > >
> > > >
> > > > On Mon, Apr 15, 2013 at 10:13 AM, Eduardo Alfaia <
> > > eduardocalfaia@gmail.com
> > > > >
> > > >  wrote:
> > > >
> > > > > Hi Guys,
> > > > > I am newer in Mesos and I am having some problems when running the
> > > launch
> > > > > mesos scripts bellow. Why does the master remove the slave? I have
> > seen
> > > > > something about checkpoint.
> > > > >
> > > > > MASTER
> > > > > root@blockmon1:/opt/mesos-trunk/build/bin# ./mesos-master.sh
> > > > > I0415 18:00:47.543422 17720 main.cpp:116] Build: 2013-04-14
> 23:48:51
> > by
> > > > > root
> > > > > I0415 18:00:47.543926 17720 main.cpp:117] Starting Mesos master
> > > > > I0415 18:00:47.545109 17720 master.cpp:309] Master started on
> > > > > 127.0.1.1:5050
> > > > > I0415 18:00:47.545351 17720 master.cpp:324] Master ID:
> > > > > 201304151800-16842879-5050-17720
> > > > > I0415 18:00:47.545819 17720 master.cpp:603] Elected as master!
> > > > > W0415 18:00:47.546039 17737 master.cpp:81] No whitelist given.
> > > > Advertising
> > > > > offers for all slaves
> > > > > W0415 18:00:52.547684 17736 master.cpp:81] No whitelist given.
> > > > Advertising
> > > > > offers for all slaves
> > > > > W0415 18:00:57.550519 17736 master.cpp:81] No whitelist given.
> > > > Advertising
> > > > > offers for all slaves
> > > > >
> > > > > se it is not checkpointing!
> > > > > I0415 18:01:59.379076 17735 hierarchical_allocator_process.hpp:423]
> > > > Removed
> > > > > slave 201304151800-16842879-5050-17720-28
> > > > > I0415 18:02:00.379822 17737 master.cpp:968] Attempting to register
> > > slave
> > > > on
> > > > > blockmon2 at slave(1)@127.0.1.1:36820
> > > > > I0415 18:02:00.380177 17737 master.cpp:1224] Master now
> considering a
> > > > slave
> > > > > at blockmon2:36820 as active
> > > > > I0415 18:02:00.380561 17737 master.cpp:1862] Adding slave
> > > > > 201304151800-16842879-5050-17720-29 at blockmon2 with cpus=1;
> > mem=979;
> > > > > ports=[31000-32000]; disk=2801
> > > > > I0415 18:02:00.380813 17737 hierarchical_allocator_process.hpp:395]
> > > Added
> > > > > slave 201304151800-16842879-5050-17720-29 (blockmon2) with cpus=1;
> > > > mem=979;
> > > > > ports=[31000-32000]; disk=2801 (and cpus=1; mem=979;
> > > ports=[31000-32000];
> > > > > disk=2801 available)
> > > > > I0415 18:02:00.381255 17734 master.cpp:537] Slave
> > > > > 201304151800-16842879-5050-17720-29(blockmon2) disconnected
> > > > > I0415 18:02:00.381474 17734 master.cpp:542] Removing disconnected
> > slave
> > > > > 201304151800-16842879-5050-17720-29(blockmon2) because it is not
> > > > > checkpointing!
> > > > > I0415 18:02:00.381882 17735 hierarchical_allocator_process.hpp:423]
> > > > Removed
> > > > > slave 201304151800-16842879-5050-17720-29
> > > > >
> > > > > Thanks Guys
> > > > >
> > > > > --
> > > > > MSc Eduardo Costa Alfaia
> > > > > PhD Student
> > > > > Università degli Studi di Brescia
> > > > >
> > > >
> > > >
> > > >
> > > > -- Vinod
> > > >
> > > >
> > > > On Mon, Apr 15, 2013 at 10:13 AM, Eduardo Alfaia
> > > > <ed...@gmail.com>wrote:
> > > >
> > > > > Hi Guys,
> > > > > I am newer in Mesos and I am having some problems when running the
> > > launch
> > > > > mesos scripts bellow. Why does the master remove the slave? I have
> > seen
> > > > > something about checkpoint.
> > > > >
> > > > > MASTER
> > > > > root@blockmon1:/opt/mesos-trunk/build/bin# ./mesos-master.sh
> > > > > I0415 18:00:47.543422 17720 main.cpp:116] Build: 2013-04-14
> 23:48:51
> > by
> > > > > root
> > > > > I0415 18:00:47.543926 17720 main.cpp:117] Starting Mesos master
> > > > > I0415 18:00:47.545109 17720 master.cpp:309] Master started on
> > > > > 127.0.1.1:5050
> > > > > I0415 18:00:47.545351 17720 master.cpp:324] Master ID:
> > > > > 201304151800-16842879-5050-17720
> > > > > I0415 18:00:47.545819 17720 master.cpp:603] Elected as master!
> > > > > W0415 18:00:47.546039 17737 master.cpp:81] No whitelist given.
> > > > Advertising
> > > > > offers for all slaves
> > > > > W0415 18:00:52.547684 17736 master.cpp:81] No whitelist given.
> > > > Advertising
> > > > > offers for all slaves
> > > > > W0415 18:00:57.550519 17736 master.cpp:81] No whitelist given.
> > > > Advertising
> > > > > offers for all slaves
> > > > >
> > > > > se it is not checkpointing!
> > > > > I0415 18:01:59.379076 17735 hierarchical_allocator_process.hpp:423]
> > > > Removed
> > > > > slave 201304151800-16842879-5050-17720-28
> > > > > I0415 18:02:00.379822 17737 master.cpp:968] Attempting to register
> > > slave
> > > > on
> > > > > blockmon2 at slave(1)@127.0.1.1:36820
> > > > > I0415 18:02:00.380177 17737 master.cpp:1224] Master now
> considering a
> > > > slave
> > > > > at blockmon2:36820 as active
> > > > > I0415 18:02:00.380561 17737 master.cpp:1862] Adding slave
> > > > > 201304151800-16842879-5050-17720-29 at blockmon2 with cpus=1;
> > mem=979;
> > > > > ports=[31000-32000]; disk=2801
> > > > > I0415 18:02:00.380813 17737 hierarchical_allocator_process.hpp:395]
> > > Added
> > > > > slave 201304151800-16842879-5050-17720-29 (blockmon2) with cpus=1;
> > > > mem=979;
> > > > > ports=[31000-32000]; disk=2801 (and cpus=1; mem=979;
> > > ports=[31000-32000];
> > > > > disk=2801 available)
> > > > > I0415 18:02:00.381255 17734 master.cpp:537] Slave
> > > > > 201304151800-16842879-5050-17720-29(blockmon2) disconnected
> > > > > I0415 18:02:00.381474 17734 master.cpp:542] Removing disconnected
> > slave
> > > > > 201304151800-16842879-5050-17720-29(blockmon2) because it is not
> > > > > checkpointing!
> > > > > I0415 18:02:00.381882 17735 hierarchical_allocator_process.hpp:423]
> > > > Removed
> > > > > slave 201304151800-16842879-5050-17720-29
> > > > >
> > > > > Thanks Guys
> > > > >
> > > > > --
> > > > > MSc Eduardo Costa Alfaia
> > > > > PhD Student
> > > > > Università degli Studi di Brescia
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > MSc Eduardo Costa Alfaia
> > > PhD Student
> > > Università degli Studi di Brescia
> > >
> >
>
>
>
> --
> MSc Eduardo Costa Alfaia
> PhD Student
> Università degli Studi di Brescia
>

Re: Slave Removedo

Posted by Eduardo Alfaia <ed...@gmail.com>.
Hi Guys,

the Private IP instead the FQDN is working, however I had had to change the
/etc/hosts

thanks


2013/4/15 Benjamin Mahler <be...@gmail.com>

> Can you try using the private IP instead? You can find it using ifconfig.
>
>
> On Mon, Apr 15, 2013 at 10:33 AM, Eduardo Alfaia
> <ed...@gmail.com>wrote:
>
> > Hi Vinod, thanks by your fast replay
> >
> > I'm not using EC2 but I'm using the name of server like, for example
> > blockmon1.ing.unibs.it. Could be this?
> >
> > I'm using 3 nodes ( 1 Master and 2 Slaves)
> >
> > Regards
> >
> >
> > 2013/4/15 Vinod Kone <vi...@gmail.com>
> >
> > > Hi Eduardo,
> > >
> > > This looks like a networking issue. What is your cluster setup like?
> > >
> > > Are you running on Amazon EC2? We have seen similar behavior before
> when
> > > users were running Mesos on EC2. If I remember correctly, the fix was
> to
> > to
> > > use private ip addresses for master and slaves, instead of "localhost"
> or
> > > "public ip".
> > >
> > > @vinodkone
> > >
> > >
> > > On Mon, Apr 15, 2013 at 10:13 AM, Eduardo Alfaia <
> > eduardocalfaia@gmail.com
> > > >
> > >  wrote:
> > >
> > > > Hi Guys,
> > > > I am newer in Mesos and I am having some problems when running the
> > launch
> > > > mesos scripts bellow. Why does the master remove the slave? I have
> seen
> > > > something about checkpoint.
> > > >
> > > > MASTER
> > > > root@blockmon1:/opt/mesos-trunk/build/bin# ./mesos-master.sh
> > > > I0415 18:00:47.543422 17720 main.cpp:116] Build: 2013-04-14 23:48:51
> by
> > > > root
> > > > I0415 18:00:47.543926 17720 main.cpp:117] Starting Mesos master
> > > > I0415 18:00:47.545109 17720 master.cpp:309] Master started on
> > > > 127.0.1.1:5050
> > > > I0415 18:00:47.545351 17720 master.cpp:324] Master ID:
> > > > 201304151800-16842879-5050-17720
> > > > I0415 18:00:47.545819 17720 master.cpp:603] Elected as master!
> > > > W0415 18:00:47.546039 17737 master.cpp:81] No whitelist given.
> > > Advertising
> > > > offers for all slaves
> > > > W0415 18:00:52.547684 17736 master.cpp:81] No whitelist given.
> > > Advertising
> > > > offers for all slaves
> > > > W0415 18:00:57.550519 17736 master.cpp:81] No whitelist given.
> > > Advertising
> > > > offers for all slaves
> > > >
> > > > se it is not checkpointing!
> > > > I0415 18:01:59.379076 17735 hierarchical_allocator_process.hpp:423]
> > > Removed
> > > > slave 201304151800-16842879-5050-17720-28
> > > > I0415 18:02:00.379822 17737 master.cpp:968] Attempting to register
> > slave
> > > on
> > > > blockmon2 at slave(1)@127.0.1.1:36820
> > > > I0415 18:02:00.380177 17737 master.cpp:1224] Master now considering a
> > > slave
> > > > at blockmon2:36820 as active
> > > > I0415 18:02:00.380561 17737 master.cpp:1862] Adding slave
> > > > 201304151800-16842879-5050-17720-29 at blockmon2 with cpus=1;
> mem=979;
> > > > ports=[31000-32000]; disk=2801
> > > > I0415 18:02:00.380813 17737 hierarchical_allocator_process.hpp:395]
> > Added
> > > > slave 201304151800-16842879-5050-17720-29 (blockmon2) with cpus=1;
> > > mem=979;
> > > > ports=[31000-32000]; disk=2801 (and cpus=1; mem=979;
> > ports=[31000-32000];
> > > > disk=2801 available)
> > > > I0415 18:02:00.381255 17734 master.cpp:537] Slave
> > > > 201304151800-16842879-5050-17720-29(blockmon2) disconnected
> > > > I0415 18:02:00.381474 17734 master.cpp:542] Removing disconnected
> slave
> > > > 201304151800-16842879-5050-17720-29(blockmon2) because it is not
> > > > checkpointing!
> > > > I0415 18:02:00.381882 17735 hierarchical_allocator_process.hpp:423]
> > > Removed
> > > > slave 201304151800-16842879-5050-17720-29
> > > >
> > > > Thanks Guys
> > > >
> > > > --
> > > > MSc Eduardo Costa Alfaia
> > > > PhD Student
> > > > Università degli Studi di Brescia
> > > >
> > >
> > >
> > >
> > > -- Vinod
> > >
> > >
> > > On Mon, Apr 15, 2013 at 10:13 AM, Eduardo Alfaia
> > > <ed...@gmail.com>wrote:
> > >
> > > > Hi Guys,
> > > > I am newer in Mesos and I am having some problems when running the
> > launch
> > > > mesos scripts bellow. Why does the master remove the slave? I have
> seen
> > > > something about checkpoint.
> > > >
> > > > MASTER
> > > > root@blockmon1:/opt/mesos-trunk/build/bin# ./mesos-master.sh
> > > > I0415 18:00:47.543422 17720 main.cpp:116] Build: 2013-04-14 23:48:51
> by
> > > > root
> > > > I0415 18:00:47.543926 17720 main.cpp:117] Starting Mesos master
> > > > I0415 18:00:47.545109 17720 master.cpp:309] Master started on
> > > > 127.0.1.1:5050
> > > > I0415 18:00:47.545351 17720 master.cpp:324] Master ID:
> > > > 201304151800-16842879-5050-17720
> > > > I0415 18:00:47.545819 17720 master.cpp:603] Elected as master!
> > > > W0415 18:00:47.546039 17737 master.cpp:81] No whitelist given.
> > > Advertising
> > > > offers for all slaves
> > > > W0415 18:00:52.547684 17736 master.cpp:81] No whitelist given.
> > > Advertising
> > > > offers for all slaves
> > > > W0415 18:00:57.550519 17736 master.cpp:81] No whitelist given.
> > > Advertising
> > > > offers for all slaves
> > > >
> > > > se it is not checkpointing!
> > > > I0415 18:01:59.379076 17735 hierarchical_allocator_process.hpp:423]
> > > Removed
> > > > slave 201304151800-16842879-5050-17720-28
> > > > I0415 18:02:00.379822 17737 master.cpp:968] Attempting to register
> > slave
> > > on
> > > > blockmon2 at slave(1)@127.0.1.1:36820
> > > > I0415 18:02:00.380177 17737 master.cpp:1224] Master now considering a
> > > slave
> > > > at blockmon2:36820 as active
> > > > I0415 18:02:00.380561 17737 master.cpp:1862] Adding slave
> > > > 201304151800-16842879-5050-17720-29 at blockmon2 with cpus=1;
> mem=979;
> > > > ports=[31000-32000]; disk=2801
> > > > I0415 18:02:00.380813 17737 hierarchical_allocator_process.hpp:395]
> > Added
> > > > slave 201304151800-16842879-5050-17720-29 (blockmon2) with cpus=1;
> > > mem=979;
> > > > ports=[31000-32000]; disk=2801 (and cpus=1; mem=979;
> > ports=[31000-32000];
> > > > disk=2801 available)
> > > > I0415 18:02:00.381255 17734 master.cpp:537] Slave
> > > > 201304151800-16842879-5050-17720-29(blockmon2) disconnected
> > > > I0415 18:02:00.381474 17734 master.cpp:542] Removing disconnected
> slave
> > > > 201304151800-16842879-5050-17720-29(blockmon2) because it is not
> > > > checkpointing!
> > > > I0415 18:02:00.381882 17735 hierarchical_allocator_process.hpp:423]
> > > Removed
> > > > slave 201304151800-16842879-5050-17720-29
> > > >
> > > > Thanks Guys
> > > >
> > > > --
> > > > MSc Eduardo Costa Alfaia
> > > > PhD Student
> > > > Università degli Studi di Brescia
> > > >
> > >
> >
> >
> >
> > --
> > MSc Eduardo Costa Alfaia
> > PhD Student
> > Università degli Studi di Brescia
> >
>



-- 
MSc Eduardo Costa Alfaia
PhD Student
Università degli Studi di Brescia

Re: Slave Removedo

Posted by Benjamin Mahler <be...@gmail.com>.
Can you try using the private IP instead? You can find it using ifconfig.


On Mon, Apr 15, 2013 at 10:33 AM, Eduardo Alfaia
<ed...@gmail.com>wrote:

> Hi Vinod, thanks by your fast replay
>
> I'm not using EC2 but I'm using the name of server like, for example
> blockmon1.ing.unibs.it. Could be this?
>
> I'm using 3 nodes ( 1 Master and 2 Slaves)
>
> Regards
>
>
> 2013/4/15 Vinod Kone <vi...@gmail.com>
>
> > Hi Eduardo,
> >
> > This looks like a networking issue. What is your cluster setup like?
> >
> > Are you running on Amazon EC2? We have seen similar behavior before when
> > users were running Mesos on EC2. If I remember correctly, the fix was to
> to
> > use private ip addresses for master and slaves, instead of "localhost" or
> > "public ip".
> >
> > @vinodkone
> >
> >
> > On Mon, Apr 15, 2013 at 10:13 AM, Eduardo Alfaia <
> eduardocalfaia@gmail.com
> > >
> >  wrote:
> >
> > > Hi Guys,
> > > I am newer in Mesos and I am having some problems when running the
> launch
> > > mesos scripts bellow. Why does the master remove the slave? I have seen
> > > something about checkpoint.
> > >
> > > MASTER
> > > root@blockmon1:/opt/mesos-trunk/build/bin# ./mesos-master.sh
> > > I0415 18:00:47.543422 17720 main.cpp:116] Build: 2013-04-14 23:48:51 by
> > > root
> > > I0415 18:00:47.543926 17720 main.cpp:117] Starting Mesos master
> > > I0415 18:00:47.545109 17720 master.cpp:309] Master started on
> > > 127.0.1.1:5050
> > > I0415 18:00:47.545351 17720 master.cpp:324] Master ID:
> > > 201304151800-16842879-5050-17720
> > > I0415 18:00:47.545819 17720 master.cpp:603] Elected as master!
> > > W0415 18:00:47.546039 17737 master.cpp:81] No whitelist given.
> > Advertising
> > > offers for all slaves
> > > W0415 18:00:52.547684 17736 master.cpp:81] No whitelist given.
> > Advertising
> > > offers for all slaves
> > > W0415 18:00:57.550519 17736 master.cpp:81] No whitelist given.
> > Advertising
> > > offers for all slaves
> > >
> > > se it is not checkpointing!
> > > I0415 18:01:59.379076 17735 hierarchical_allocator_process.hpp:423]
> > Removed
> > > slave 201304151800-16842879-5050-17720-28
> > > I0415 18:02:00.379822 17737 master.cpp:968] Attempting to register
> slave
> > on
> > > blockmon2 at slave(1)@127.0.1.1:36820
> > > I0415 18:02:00.380177 17737 master.cpp:1224] Master now considering a
> > slave
> > > at blockmon2:36820 as active
> > > I0415 18:02:00.380561 17737 master.cpp:1862] Adding slave
> > > 201304151800-16842879-5050-17720-29 at blockmon2 with cpus=1; mem=979;
> > > ports=[31000-32000]; disk=2801
> > > I0415 18:02:00.380813 17737 hierarchical_allocator_process.hpp:395]
> Added
> > > slave 201304151800-16842879-5050-17720-29 (blockmon2) with cpus=1;
> > mem=979;
> > > ports=[31000-32000]; disk=2801 (and cpus=1; mem=979;
> ports=[31000-32000];
> > > disk=2801 available)
> > > I0415 18:02:00.381255 17734 master.cpp:537] Slave
> > > 201304151800-16842879-5050-17720-29(blockmon2) disconnected
> > > I0415 18:02:00.381474 17734 master.cpp:542] Removing disconnected slave
> > > 201304151800-16842879-5050-17720-29(blockmon2) because it is not
> > > checkpointing!
> > > I0415 18:02:00.381882 17735 hierarchical_allocator_process.hpp:423]
> > Removed
> > > slave 201304151800-16842879-5050-17720-29
> > >
> > > Thanks Guys
> > >
> > > --
> > > MSc Eduardo Costa Alfaia
> > > PhD Student
> > > Università degli Studi di Brescia
> > >
> >
> >
> >
> > -- Vinod
> >
> >
> > On Mon, Apr 15, 2013 at 10:13 AM, Eduardo Alfaia
> > <ed...@gmail.com>wrote:
> >
> > > Hi Guys,
> > > I am newer in Mesos and I am having some problems when running the
> launch
> > > mesos scripts bellow. Why does the master remove the slave? I have seen
> > > something about checkpoint.
> > >
> > > MASTER
> > > root@blockmon1:/opt/mesos-trunk/build/bin# ./mesos-master.sh
> > > I0415 18:00:47.543422 17720 main.cpp:116] Build: 2013-04-14 23:48:51 by
> > > root
> > > I0415 18:00:47.543926 17720 main.cpp:117] Starting Mesos master
> > > I0415 18:00:47.545109 17720 master.cpp:309] Master started on
> > > 127.0.1.1:5050
> > > I0415 18:00:47.545351 17720 master.cpp:324] Master ID:
> > > 201304151800-16842879-5050-17720
> > > I0415 18:00:47.545819 17720 master.cpp:603] Elected as master!
> > > W0415 18:00:47.546039 17737 master.cpp:81] No whitelist given.
> > Advertising
> > > offers for all slaves
> > > W0415 18:00:52.547684 17736 master.cpp:81] No whitelist given.
> > Advertising
> > > offers for all slaves
> > > W0415 18:00:57.550519 17736 master.cpp:81] No whitelist given.
> > Advertising
> > > offers for all slaves
> > >
> > > se it is not checkpointing!
> > > I0415 18:01:59.379076 17735 hierarchical_allocator_process.hpp:423]
> > Removed
> > > slave 201304151800-16842879-5050-17720-28
> > > I0415 18:02:00.379822 17737 master.cpp:968] Attempting to register
> slave
> > on
> > > blockmon2 at slave(1)@127.0.1.1:36820
> > > I0415 18:02:00.380177 17737 master.cpp:1224] Master now considering a
> > slave
> > > at blockmon2:36820 as active
> > > I0415 18:02:00.380561 17737 master.cpp:1862] Adding slave
> > > 201304151800-16842879-5050-17720-29 at blockmon2 with cpus=1; mem=979;
> > > ports=[31000-32000]; disk=2801
> > > I0415 18:02:00.380813 17737 hierarchical_allocator_process.hpp:395]
> Added
> > > slave 201304151800-16842879-5050-17720-29 (blockmon2) with cpus=1;
> > mem=979;
> > > ports=[31000-32000]; disk=2801 (and cpus=1; mem=979;
> ports=[31000-32000];
> > > disk=2801 available)
> > > I0415 18:02:00.381255 17734 master.cpp:537] Slave
> > > 201304151800-16842879-5050-17720-29(blockmon2) disconnected
> > > I0415 18:02:00.381474 17734 master.cpp:542] Removing disconnected slave
> > > 201304151800-16842879-5050-17720-29(blockmon2) because it is not
> > > checkpointing!
> > > I0415 18:02:00.381882 17735 hierarchical_allocator_process.hpp:423]
> > Removed
> > > slave 201304151800-16842879-5050-17720-29
> > >
> > > Thanks Guys
> > >
> > > --
> > > MSc Eduardo Costa Alfaia
> > > PhD Student
> > > Università degli Studi di Brescia
> > >
> >
>
>
>
> --
> MSc Eduardo Costa Alfaia
> PhD Student
> Università degli Studi di Brescia
>

Re: Slave Removedo

Posted by Eduardo Alfaia <ed...@gmail.com>.
Hi Vinod, thanks by your fast replay

I'm not using EC2 but I'm using the name of server like, for example
blockmon1.ing.unibs.it. Could be this?

I'm using 3 nodes ( 1 Master and 2 Slaves)

Regards


2013/4/15 Vinod Kone <vi...@gmail.com>

> Hi Eduardo,
>
> This looks like a networking issue. What is your cluster setup like?
>
> Are you running on Amazon EC2? We have seen similar behavior before when
> users were running Mesos on EC2. If I remember correctly, the fix was to to
> use private ip addresses for master and slaves, instead of "localhost" or
> "public ip".
>
> @vinodkone
>
>
> On Mon, Apr 15, 2013 at 10:13 AM, Eduardo Alfaia <eduardocalfaia@gmail.com
> >
>  wrote:
>
> > Hi Guys,
> > I am newer in Mesos and I am having some problems when running the launch
> > mesos scripts bellow. Why does the master remove the slave? I have seen
> > something about checkpoint.
> >
> > MASTER
> > root@blockmon1:/opt/mesos-trunk/build/bin# ./mesos-master.sh
> > I0415 18:00:47.543422 17720 main.cpp:116] Build: 2013-04-14 23:48:51 by
> > root
> > I0415 18:00:47.543926 17720 main.cpp:117] Starting Mesos master
> > I0415 18:00:47.545109 17720 master.cpp:309] Master started on
> > 127.0.1.1:5050
> > I0415 18:00:47.545351 17720 master.cpp:324] Master ID:
> > 201304151800-16842879-5050-17720
> > I0415 18:00:47.545819 17720 master.cpp:603] Elected as master!
> > W0415 18:00:47.546039 17737 master.cpp:81] No whitelist given.
> Advertising
> > offers for all slaves
> > W0415 18:00:52.547684 17736 master.cpp:81] No whitelist given.
> Advertising
> > offers for all slaves
> > W0415 18:00:57.550519 17736 master.cpp:81] No whitelist given.
> Advertising
> > offers for all slaves
> >
> > se it is not checkpointing!
> > I0415 18:01:59.379076 17735 hierarchical_allocator_process.hpp:423]
> Removed
> > slave 201304151800-16842879-5050-17720-28
> > I0415 18:02:00.379822 17737 master.cpp:968] Attempting to register slave
> on
> > blockmon2 at slave(1)@127.0.1.1:36820
> > I0415 18:02:00.380177 17737 master.cpp:1224] Master now considering a
> slave
> > at blockmon2:36820 as active
> > I0415 18:02:00.380561 17737 master.cpp:1862] Adding slave
> > 201304151800-16842879-5050-17720-29 at blockmon2 with cpus=1; mem=979;
> > ports=[31000-32000]; disk=2801
> > I0415 18:02:00.380813 17737 hierarchical_allocator_process.hpp:395] Added
> > slave 201304151800-16842879-5050-17720-29 (blockmon2) with cpus=1;
> mem=979;
> > ports=[31000-32000]; disk=2801 (and cpus=1; mem=979; ports=[31000-32000];
> > disk=2801 available)
> > I0415 18:02:00.381255 17734 master.cpp:537] Slave
> > 201304151800-16842879-5050-17720-29(blockmon2) disconnected
> > I0415 18:02:00.381474 17734 master.cpp:542] Removing disconnected slave
> > 201304151800-16842879-5050-17720-29(blockmon2) because it is not
> > checkpointing!
> > I0415 18:02:00.381882 17735 hierarchical_allocator_process.hpp:423]
> Removed
> > slave 201304151800-16842879-5050-17720-29
> >
> > Thanks Guys
> >
> > --
> > MSc Eduardo Costa Alfaia
> > PhD Student
> > Università degli Studi di Brescia
> >
>
>
>
> -- Vinod
>
>
> On Mon, Apr 15, 2013 at 10:13 AM, Eduardo Alfaia
> <ed...@gmail.com>wrote:
>
> > Hi Guys,
> > I am newer in Mesos and I am having some problems when running the launch
> > mesos scripts bellow. Why does the master remove the slave? I have seen
> > something about checkpoint.
> >
> > MASTER
> > root@blockmon1:/opt/mesos-trunk/build/bin# ./mesos-master.sh
> > I0415 18:00:47.543422 17720 main.cpp:116] Build: 2013-04-14 23:48:51 by
> > root
> > I0415 18:00:47.543926 17720 main.cpp:117] Starting Mesos master
> > I0415 18:00:47.545109 17720 master.cpp:309] Master started on
> > 127.0.1.1:5050
> > I0415 18:00:47.545351 17720 master.cpp:324] Master ID:
> > 201304151800-16842879-5050-17720
> > I0415 18:00:47.545819 17720 master.cpp:603] Elected as master!
> > W0415 18:00:47.546039 17737 master.cpp:81] No whitelist given.
> Advertising
> > offers for all slaves
> > W0415 18:00:52.547684 17736 master.cpp:81] No whitelist given.
> Advertising
> > offers for all slaves
> > W0415 18:00:57.550519 17736 master.cpp:81] No whitelist given.
> Advertising
> > offers for all slaves
> >
> > se it is not checkpointing!
> > I0415 18:01:59.379076 17735 hierarchical_allocator_process.hpp:423]
> Removed
> > slave 201304151800-16842879-5050-17720-28
> > I0415 18:02:00.379822 17737 master.cpp:968] Attempting to register slave
> on
> > blockmon2 at slave(1)@127.0.1.1:36820
> > I0415 18:02:00.380177 17737 master.cpp:1224] Master now considering a
> slave
> > at blockmon2:36820 as active
> > I0415 18:02:00.380561 17737 master.cpp:1862] Adding slave
> > 201304151800-16842879-5050-17720-29 at blockmon2 with cpus=1; mem=979;
> > ports=[31000-32000]; disk=2801
> > I0415 18:02:00.380813 17737 hierarchical_allocator_process.hpp:395] Added
> > slave 201304151800-16842879-5050-17720-29 (blockmon2) with cpus=1;
> mem=979;
> > ports=[31000-32000]; disk=2801 (and cpus=1; mem=979; ports=[31000-32000];
> > disk=2801 available)
> > I0415 18:02:00.381255 17734 master.cpp:537] Slave
> > 201304151800-16842879-5050-17720-29(blockmon2) disconnected
> > I0415 18:02:00.381474 17734 master.cpp:542] Removing disconnected slave
> > 201304151800-16842879-5050-17720-29(blockmon2) because it is not
> > checkpointing!
> > I0415 18:02:00.381882 17735 hierarchical_allocator_process.hpp:423]
> Removed
> > slave 201304151800-16842879-5050-17720-29
> >
> > Thanks Guys
> >
> > --
> > MSc Eduardo Costa Alfaia
> > PhD Student
> > Università degli Studi di Brescia
> >
>



-- 
MSc Eduardo Costa Alfaia
PhD Student
Università degli Studi di Brescia

Re: Slave Removedo

Posted by Vinod Kone <vi...@gmail.com>.
Hi Eduardo,

This looks like a networking issue. What is your cluster setup like?

Are you running on Amazon EC2? We have seen similar behavior before when
users were running Mesos on EC2. If I remember correctly, the fix was to to
use private ip addresses for master and slaves, instead of "localhost" or
"public ip".

@vinodkone


On Mon, Apr 15, 2013 at 10:13 AM, Eduardo Alfaia <ed...@gmail.com>
 wrote:

> Hi Guys,
> I am newer in Mesos and I am having some problems when running the launch
> mesos scripts bellow. Why does the master remove the slave? I have seen
> something about checkpoint.
>
> MASTER
> root@blockmon1:/opt/mesos-trunk/build/bin# ./mesos-master.sh
> I0415 18:00:47.543422 17720 main.cpp:116] Build: 2013-04-14 23:48:51 by
> root
> I0415 18:00:47.543926 17720 main.cpp:117] Starting Mesos master
> I0415 18:00:47.545109 17720 master.cpp:309] Master started on
> 127.0.1.1:5050
> I0415 18:00:47.545351 17720 master.cpp:324] Master ID:
> 201304151800-16842879-5050-17720
> I0415 18:00:47.545819 17720 master.cpp:603] Elected as master!
> W0415 18:00:47.546039 17737 master.cpp:81] No whitelist given. Advertising
> offers for all slaves
> W0415 18:00:52.547684 17736 master.cpp:81] No whitelist given. Advertising
> offers for all slaves
> W0415 18:00:57.550519 17736 master.cpp:81] No whitelist given. Advertising
> offers for all slaves
>
> se it is not checkpointing!
> I0415 18:01:59.379076 17735 hierarchical_allocator_process.hpp:423] Removed
> slave 201304151800-16842879-5050-17720-28
> I0415 18:02:00.379822 17737 master.cpp:968] Attempting to register slave on
> blockmon2 at slave(1)@127.0.1.1:36820
> I0415 18:02:00.380177 17737 master.cpp:1224] Master now considering a slave
> at blockmon2:36820 as active
> I0415 18:02:00.380561 17737 master.cpp:1862] Adding slave
> 201304151800-16842879-5050-17720-29 at blockmon2 with cpus=1; mem=979;
> ports=[31000-32000]; disk=2801
> I0415 18:02:00.380813 17737 hierarchical_allocator_process.hpp:395] Added
> slave 201304151800-16842879-5050-17720-29 (blockmon2) with cpus=1; mem=979;
> ports=[31000-32000]; disk=2801 (and cpus=1; mem=979; ports=[31000-32000];
> disk=2801 available)
> I0415 18:02:00.381255 17734 master.cpp:537] Slave
> 201304151800-16842879-5050-17720-29(blockmon2) disconnected
> I0415 18:02:00.381474 17734 master.cpp:542] Removing disconnected slave
> 201304151800-16842879-5050-17720-29(blockmon2) because it is not
> checkpointing!
> I0415 18:02:00.381882 17735 hierarchical_allocator_process.hpp:423] Removed
> slave 201304151800-16842879-5050-17720-29
>
> Thanks Guys
>
> --
> MSc Eduardo Costa Alfaia
> PhD Student
> Università degli Studi di Brescia
>



-- Vinod


On Mon, Apr 15, 2013 at 10:13 AM, Eduardo Alfaia
<ed...@gmail.com>wrote:

> Hi Guys,
> I am newer in Mesos and I am having some problems when running the launch
> mesos scripts bellow. Why does the master remove the slave? I have seen
> something about checkpoint.
>
> MASTER
> root@blockmon1:/opt/mesos-trunk/build/bin# ./mesos-master.sh
> I0415 18:00:47.543422 17720 main.cpp:116] Build: 2013-04-14 23:48:51 by
> root
> I0415 18:00:47.543926 17720 main.cpp:117] Starting Mesos master
> I0415 18:00:47.545109 17720 master.cpp:309] Master started on
> 127.0.1.1:5050
> I0415 18:00:47.545351 17720 master.cpp:324] Master ID:
> 201304151800-16842879-5050-17720
> I0415 18:00:47.545819 17720 master.cpp:603] Elected as master!
> W0415 18:00:47.546039 17737 master.cpp:81] No whitelist given. Advertising
> offers for all slaves
> W0415 18:00:52.547684 17736 master.cpp:81] No whitelist given. Advertising
> offers for all slaves
> W0415 18:00:57.550519 17736 master.cpp:81] No whitelist given. Advertising
> offers for all slaves
>
> se it is not checkpointing!
> I0415 18:01:59.379076 17735 hierarchical_allocator_process.hpp:423] Removed
> slave 201304151800-16842879-5050-17720-28
> I0415 18:02:00.379822 17737 master.cpp:968] Attempting to register slave on
> blockmon2 at slave(1)@127.0.1.1:36820
> I0415 18:02:00.380177 17737 master.cpp:1224] Master now considering a slave
> at blockmon2:36820 as active
> I0415 18:02:00.380561 17737 master.cpp:1862] Adding slave
> 201304151800-16842879-5050-17720-29 at blockmon2 with cpus=1; mem=979;
> ports=[31000-32000]; disk=2801
> I0415 18:02:00.380813 17737 hierarchical_allocator_process.hpp:395] Added
> slave 201304151800-16842879-5050-17720-29 (blockmon2) with cpus=1; mem=979;
> ports=[31000-32000]; disk=2801 (and cpus=1; mem=979; ports=[31000-32000];
> disk=2801 available)
> I0415 18:02:00.381255 17734 master.cpp:537] Slave
> 201304151800-16842879-5050-17720-29(blockmon2) disconnected
> I0415 18:02:00.381474 17734 master.cpp:542] Removing disconnected slave
> 201304151800-16842879-5050-17720-29(blockmon2) because it is not
> checkpointing!
> I0415 18:02:00.381882 17735 hierarchical_allocator_process.hpp:423] Removed
> slave 201304151800-16842879-5050-17720-29
>
> Thanks Guys
>
> --
> MSc Eduardo Costa Alfaia
> PhD Student
> Università degli Studi di Brescia
>