You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Damien Hardy <dh...@viadeoteam.com> on 2013/09/23 11:52:16 UTC

mesos frameworks registration ?

Hello there,

I might miss something about framework deployment on mesos.

I try to get chronos or marathon frameworks working with HEAD of mesos
running distributed.

I mesos topology seams OK slaves report to master and I can see offers of
resources (total available) on the mesos HTTP interface.

192.168.255.1 : marathon or chronos
192.168.255.2 : zookeeper + mesos master
192.168.255.3 : mesos slave

Then I start marathon or chornos (HEAD version for both with pom.xml using
"<mesos.version>0.15.0-20130910-2</mesos.version>" for example.

It seams succeed in finding master, I can see the frameworks listed.
But mesos services seams complain permanently, flooding logs on slave with :

```
2013-09-23 11:35:37,405:2264(0x7faf54a73700):ZOO_DEBUG@zookeeper_process@1983:
Got ping response in 0 ms
W0923 11:35:38.002933  2267 slave.cpp:1322] Ignoring updating pid for
framework marathon-0.0.6 because it does not exist
W0923 11:35:38.359627  2269 slave.cpp:1322] Ignoring updating pid for
framework marathon-0.0.6 because it does not exist
W0923 11:35:39.003171  2266 slave.cpp:1322] Ignoring updating pid for
framework marathon-0.0.6 because it does not exist
```

and master also with :

I0923 11:35:33.420017  3685 master.cpp:734] Re-registering framework
marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
I0923 11:35:33.420178  3685 master.cpp:753] Framework marathon-0.0.6 failed
over
I0923 11:35:33.668504  3683 master.cpp:1445] Sending 1 offers to framework
marathon-0.0.6
W0923 11:35:33.708227  3686 master.cpp:80] No whitelist given. Advertising
offers for all slaves
I0923 11:35:33.776002  3686 master.cpp:734] Re-registering framework
marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
I0923 11:35:33.776146  3686 master.cpp:753] Framework marathon-0.0.6 failed
over
I0923 11:35:33.776432  3684 hierarchical_allocator_process.hpp:598]
Recovered cpus(*):2; mem(*):2942; disk(*):35195; ports(*):[31000-32000]
(total allocatable: cpus(*):2; mem(*):2942; disk(*):35195;
ports(*):[31000-32000]) on slave 201309231034-50309312-5050-1111-2 from
framework marathon-0.0.6
I0923 11:35:34.419661  3686 master.cpp:734] Re-registering framework
marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
I0923 11:35:34.419801  3686 master.cpp:753] Framework marathon-0.0.6 failed
over
I0923 11:35:34.669680  3684 master.cpp:1445] Sending 1 offers to framework
marathon-0.0.6
I0923 11:35:34.776325  3684 master.cpp:734] Re-registering framework
marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
I0923 11:35:34.776445  3684 master.cpp:753] Framework marathon-0.0.6 failed
over
I0923 11:35:34.776748  3684 hierarchical_allocator_process.hpp:598]
Recovered cpus(*):2; mem(*):2942; disk(*):35195; ports(*):[31000-32000]
(total allocatable: cpus(*):2; mem(*):2942; disk(*):35195;
ports(*):[31000-32000]) on slave 201309231034-50309312-5050-1111-2 from
framework marathon-0.0.6

When I try to start a service with marathon : base on the example given :

marathon -H http://192.168.255.1:8080 start -i chronos -u
https://s3.amazonaws.com/mesosphere-binaries-public/chronos/chronos.tgz -C
"./chronos/bin/demo ./chronos/config/nomail.yml
./chronos/target/chronos-1.0-SNAPSHOT.jar"
Starting app 'chronos'
ERROR:

Seams to be there :

marathon -H http://192.168.255.1:8080 list
App ID:    chronos
Command:   ./chronos/bin/demo ./chronos/config/nomail.yml
./chronos/target/chronos-1.0-SNAPSHOT.jar
Instances: 1
CPUs:      1.0
Memory:    10.0 MB
URI:
https://s3.amazonaws.com/mesosphere-binaries-public/chronos/chronos.tgz

chronos have the same problem about non existing id on slave, I can create
scheduled command but it is never executed.

Thank you for any help understanding this.

-- 
Damien HARDY

Re: mesos frameworks registration ?

Posted by Damien Hardy <dh...@viadeoteam.com>.
Hello,

I give a try with mesos-0.14.0-rc4 and the 2 frameworks can register on
mesos.
And execute tasks (batch and long running)



2013/9/24 Damien Hardy <dh...@viadeoteam.com>

> Not yet,
>
> I suspected some misconfiguration on mesos part because chronos as the
> same behaviour.
>
>
>
> 2013/9/23 Benjamin Mahler <be...@gmail.com>
>
>> It looks like the Marathon framework is continually failing over, have
>> you sought help from the Marathon developers?
>>
>>
>> On Mon, Sep 23, 2013 at 2:52 AM, Damien Hardy <dh...@viadeoteam.com>wrote:
>>
>>> Hello there,
>>>
>>> I might miss something about framework deployment on mesos.
>>>
>>> I try to get chronos or marathon frameworks working with HEAD of mesos
>>> running distributed.
>>>
>>> I mesos topology seams OK slaves report to master and I can see offers
>>> of resources (total available) on the mesos HTTP interface.
>>>
>>> 192.168.255.1 : marathon or chronos
>>> 192.168.255.2 : zookeeper + mesos master
>>> 192.168.255.3 : mesos slave
>>>
>>> Then I start marathon or chornos (HEAD version for both with pom.xml
>>> using "<mesos.version>0.15.0-20130910-2</mesos.version>" for example.
>>>
>>> It seams succeed in finding master, I can see the frameworks listed.
>>> But mesos services seams complain permanently, flooding logs on slave
>>> with :
>>>
>>> ```
>>> 2013-09-23 11:35:37,405:2264(0x7faf54a73700):ZOO_DEBUG@zookeeper_process@1983:
>>> Got ping response in 0 ms
>>> W0923 11:35:38.002933  2267 slave.cpp:1322] Ignoring updating pid for
>>> framework marathon-0.0.6 because it does not exist
>>> W0923 11:35:38.359627  2269 slave.cpp:1322] Ignoring updating pid for
>>> framework marathon-0.0.6 because it does not exist
>>> W0923 11:35:39.003171  2266 slave.cpp:1322] Ignoring updating pid for
>>> framework marathon-0.0.6 because it does not exist
>>> ```
>>>
>>> and master also with :
>>>
>>> I0923 11:35:33.420017  3685 master.cpp:734] Re-registering framework
>>> marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
>>> I0923 11:35:33.420178  3685 master.cpp:753] Framework marathon-0.0.6
>>> failed over
>>> I0923 11:35:33.668504  3683 master.cpp:1445] Sending 1 offers to
>>> framework marathon-0.0.6
>>> W0923 11:35:33.708227  3686 master.cpp:80] No whitelist given.
>>> Advertising offers for all slaves
>>> I0923 11:35:33.776002  3686 master.cpp:734] Re-registering framework
>>> marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
>>> I0923 11:35:33.776146  3686 master.cpp:753] Framework marathon-0.0.6
>>> failed over
>>> I0923 11:35:33.776432  3684 hierarchical_allocator_process.hpp:598]
>>> Recovered cpus(*):2; mem(*):2942; disk(*):35195; ports(*):[31000-32000]
>>> (total allocatable: cpus(*):2; mem(*):2942; disk(*):35195;
>>> ports(*):[31000-32000]) on slave 201309231034-50309312-5050-1111-2 from
>>> framework marathon-0.0.6
>>> I0923 11:35:34.419661  3686 master.cpp:734] Re-registering framework
>>> marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
>>> I0923 11:35:34.419801  3686 master.cpp:753] Framework marathon-0.0.6
>>> failed over
>>> I0923 11:35:34.669680  3684 master.cpp:1445] Sending 1 offers to
>>> framework marathon-0.0.6
>>> I0923 11:35:34.776325  3684 master.cpp:734] Re-registering framework
>>> marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
>>> I0923 11:35:34.776445  3684 master.cpp:753] Framework marathon-0.0.6
>>> failed over
>>> I0923 11:35:34.776748  3684 hierarchical_allocator_process.hpp:598]
>>> Recovered cpus(*):2; mem(*):2942; disk(*):35195; ports(*):[31000-32000]
>>> (total allocatable: cpus(*):2; mem(*):2942; disk(*):35195;
>>> ports(*):[31000-32000]) on slave 201309231034-50309312-5050-1111-2 from
>>> framework marathon-0.0.6
>>>
>>> When I try to start a service with marathon : base on the example given :
>>>
>>> marathon -H http://192.168.255.1:8080 start -i chronos -u
>>> https://s3.amazonaws.com/mesosphere-binaries-public/chronos/chronos.tgz-C "./chronos/bin/demo ./chronos/config/nomail.yml
>>> ./chronos/target/chronos-1.0-SNAPSHOT.jar"
>>> Starting app 'chronos'
>>> ERROR:
>>>
>>> Seams to be there :
>>>
>>> marathon -H http://192.168.255.1:8080 list
>>> App ID:    chronos
>>> Command:   ./chronos/bin/demo ./chronos/config/nomail.yml
>>> ./chronos/target/chronos-1.0-SNAPSHOT.jar
>>> Instances: 1
>>> CPUs:      1.0
>>> Memory:    10.0 MB
>>> URI:
>>> https://s3.amazonaws.com/mesosphere-binaries-public/chronos/chronos.tgz
>>>
>>> chronos have the same problem about non existing id on slave, I can
>>> create scheduled command but it is never executed.
>>>
>>> Thank you for any help understanding this.
>>>
>>> --
>>> Damien HARDY
>>>
>>
>>
>
>
> --
> Damien HARDY
> IT Infrastructure Architect
>  Viadeo - 30 rue de la Victoire - 75009 Paris - France
>



-- 
Damien HARDY
IT Infrastructure Architect
Viadeo - 30 rue de la Victoire - 75009 Paris - France

Re: mesos frameworks registration ?

Posted by Damien Hardy <dh...@viadeoteam.com>.
Not yet,

I suspected some misconfiguration on mesos part because chronos as the same
behaviour.



2013/9/23 Benjamin Mahler <be...@gmail.com>

> It looks like the Marathon framework is continually failing over, have you
> sought help from the Marathon developers?
>
>
> On Mon, Sep 23, 2013 at 2:52 AM, Damien Hardy <dh...@viadeoteam.com>wrote:
>
>> Hello there,
>>
>> I might miss something about framework deployment on mesos.
>>
>> I try to get chronos or marathon frameworks working with HEAD of mesos
>> running distributed.
>>
>> I mesos topology seams OK slaves report to master and I can see offers of
>> resources (total available) on the mesos HTTP interface.
>>
>> 192.168.255.1 : marathon or chronos
>> 192.168.255.2 : zookeeper + mesos master
>> 192.168.255.3 : mesos slave
>>
>> Then I start marathon or chornos (HEAD version for both with pom.xml
>> using "<mesos.version>0.15.0-20130910-2</mesos.version>" for example.
>>
>> It seams succeed in finding master, I can see the frameworks listed.
>> But mesos services seams complain permanently, flooding logs on slave
>> with :
>>
>> ```
>> 2013-09-23 11:35:37,405:2264(0x7faf54a73700):ZOO_DEBUG@zookeeper_process@1983:
>> Got ping response in 0 ms
>> W0923 11:35:38.002933  2267 slave.cpp:1322] Ignoring updating pid for
>> framework marathon-0.0.6 because it does not exist
>> W0923 11:35:38.359627  2269 slave.cpp:1322] Ignoring updating pid for
>> framework marathon-0.0.6 because it does not exist
>> W0923 11:35:39.003171  2266 slave.cpp:1322] Ignoring updating pid for
>> framework marathon-0.0.6 because it does not exist
>> ```
>>
>> and master also with :
>>
>> I0923 11:35:33.420017  3685 master.cpp:734] Re-registering framework
>> marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
>> I0923 11:35:33.420178  3685 master.cpp:753] Framework marathon-0.0.6
>> failed over
>> I0923 11:35:33.668504  3683 master.cpp:1445] Sending 1 offers to
>> framework marathon-0.0.6
>> W0923 11:35:33.708227  3686 master.cpp:80] No whitelist given.
>> Advertising offers for all slaves
>> I0923 11:35:33.776002  3686 master.cpp:734] Re-registering framework
>> marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
>> I0923 11:35:33.776146  3686 master.cpp:753] Framework marathon-0.0.6
>> failed over
>> I0923 11:35:33.776432  3684 hierarchical_allocator_process.hpp:598]
>> Recovered cpus(*):2; mem(*):2942; disk(*):35195; ports(*):[31000-32000]
>> (total allocatable: cpus(*):2; mem(*):2942; disk(*):35195;
>> ports(*):[31000-32000]) on slave 201309231034-50309312-5050-1111-2 from
>> framework marathon-0.0.6
>> I0923 11:35:34.419661  3686 master.cpp:734] Re-registering framework
>> marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
>> I0923 11:35:34.419801  3686 master.cpp:753] Framework marathon-0.0.6
>> failed over
>> I0923 11:35:34.669680  3684 master.cpp:1445] Sending 1 offers to
>> framework marathon-0.0.6
>> I0923 11:35:34.776325  3684 master.cpp:734] Re-registering framework
>> marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
>> I0923 11:35:34.776445  3684 master.cpp:753] Framework marathon-0.0.6
>> failed over
>> I0923 11:35:34.776748  3684 hierarchical_allocator_process.hpp:598]
>> Recovered cpus(*):2; mem(*):2942; disk(*):35195; ports(*):[31000-32000]
>> (total allocatable: cpus(*):2; mem(*):2942; disk(*):35195;
>> ports(*):[31000-32000]) on slave 201309231034-50309312-5050-1111-2 from
>> framework marathon-0.0.6
>>
>> When I try to start a service with marathon : base on the example given :
>>
>> marathon -H http://192.168.255.1:8080 start -i chronos -u
>> https://s3.amazonaws.com/mesosphere-binaries-public/chronos/chronos.tgz-C "./chronos/bin/demo ./chronos/config/nomail.yml
>> ./chronos/target/chronos-1.0-SNAPSHOT.jar"
>> Starting app 'chronos'
>> ERROR:
>>
>> Seams to be there :
>>
>> marathon -H http://192.168.255.1:8080 list
>> App ID:    chronos
>> Command:   ./chronos/bin/demo ./chronos/config/nomail.yml
>> ./chronos/target/chronos-1.0-SNAPSHOT.jar
>> Instances: 1
>> CPUs:      1.0
>> Memory:    10.0 MB
>> URI:
>> https://s3.amazonaws.com/mesosphere-binaries-public/chronos/chronos.tgz
>>
>> chronos have the same problem about non existing id on slave, I can
>> create scheduled command but it is never executed.
>>
>> Thank you for any help understanding this.
>>
>> --
>> Damien HARDY
>>
>
>


-- 
Damien HARDY
IT Infrastructure Architect
Viadeo - 30 rue de la Victoire - 75009 Paris - France

Re: mesos frameworks registration ?

Posted by Benjamin Mahler <be...@gmail.com>.
It looks like the Marathon framework is continually failing over, have you
sought help from the Marathon developers?


On Mon, Sep 23, 2013 at 2:52 AM, Damien Hardy <dh...@viadeoteam.com> wrote:

> Hello there,
>
> I might miss something about framework deployment on mesos.
>
> I try to get chronos or marathon frameworks working with HEAD of mesos
> running distributed.
>
> I mesos topology seams OK slaves report to master and I can see offers of
> resources (total available) on the mesos HTTP interface.
>
> 192.168.255.1 : marathon or chronos
> 192.168.255.2 : zookeeper + mesos master
> 192.168.255.3 : mesos slave
>
> Then I start marathon or chornos (HEAD version for both with pom.xml using
> "<mesos.version>0.15.0-20130910-2</mesos.version>" for example.
>
> It seams succeed in finding master, I can see the frameworks listed.
> But mesos services seams complain permanently, flooding logs on slave with
> :
>
> ```
> 2013-09-23 11:35:37,405:2264(0x7faf54a73700):ZOO_DEBUG@zookeeper_process@1983:
> Got ping response in 0 ms
> W0923 11:35:38.002933  2267 slave.cpp:1322] Ignoring updating pid for
> framework marathon-0.0.6 because it does not exist
> W0923 11:35:38.359627  2269 slave.cpp:1322] Ignoring updating pid for
> framework marathon-0.0.6 because it does not exist
> W0923 11:35:39.003171  2266 slave.cpp:1322] Ignoring updating pid for
> framework marathon-0.0.6 because it does not exist
> ```
>
> and master also with :
>
> I0923 11:35:33.420017  3685 master.cpp:734] Re-registering framework
> marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
> I0923 11:35:33.420178  3685 master.cpp:753] Framework marathon-0.0.6
> failed over
> I0923 11:35:33.668504  3683 master.cpp:1445] Sending 1 offers to framework
> marathon-0.0.6
> W0923 11:35:33.708227  3686 master.cpp:80] No whitelist given. Advertising
> offers for all slaves
> I0923 11:35:33.776002  3686 master.cpp:734] Re-registering framework
> marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
> I0923 11:35:33.776146  3686 master.cpp:753] Framework marathon-0.0.6
> failed over
> I0923 11:35:33.776432  3684 hierarchical_allocator_process.hpp:598]
> Recovered cpus(*):2; mem(*):2942; disk(*):35195; ports(*):[31000-32000]
> (total allocatable: cpus(*):2; mem(*):2942; disk(*):35195;
> ports(*):[31000-32000]) on slave 201309231034-50309312-5050-1111-2 from
> framework marathon-0.0.6
> I0923 11:35:34.419661  3686 master.cpp:734] Re-registering framework
> marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
> I0923 11:35:34.419801  3686 master.cpp:753] Framework marathon-0.0.6
> failed over
> I0923 11:35:34.669680  3684 master.cpp:1445] Sending 1 offers to framework
> marathon-0.0.6
> I0923 11:35:34.776325  3684 master.cpp:734] Re-registering framework
> marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
> I0923 11:35:34.776445  3684 master.cpp:753] Framework marathon-0.0.6
> failed over
> I0923 11:35:34.776748  3684 hierarchical_allocator_process.hpp:598]
> Recovered cpus(*):2; mem(*):2942; disk(*):35195; ports(*):[31000-32000]
> (total allocatable: cpus(*):2; mem(*):2942; disk(*):35195;
> ports(*):[31000-32000]) on slave 201309231034-50309312-5050-1111-2 from
> framework marathon-0.0.6
>
> When I try to start a service with marathon : base on the example given :
>
> marathon -H http://192.168.255.1:8080 start -i chronos -u
> https://s3.amazonaws.com/mesosphere-binaries-public/chronos/chronos.tgz-C "./chronos/bin/demo ./chronos/config/nomail.yml
> ./chronos/target/chronos-1.0-SNAPSHOT.jar"
> Starting app 'chronos'
> ERROR:
>
> Seams to be there :
>
> marathon -H http://192.168.255.1:8080 list
> App ID:    chronos
> Command:   ./chronos/bin/demo ./chronos/config/nomail.yml
> ./chronos/target/chronos-1.0-SNAPSHOT.jar
> Instances: 1
> CPUs:      1.0
> Memory:    10.0 MB
> URI:
> https://s3.amazonaws.com/mesosphere-binaries-public/chronos/chronos.tgz
>
> chronos have the same problem about non existing id on slave, I can create
> scheduled command but it is never executed.
>
> Thank you for any help understanding this.
>
> --
> Damien HARDY
>