You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Aditi Dixit <ad...@gmail.com> on 2015/06/19 16:03:24 UTC

Help required in doing Reregistration

Hi,
I modified the test_framework.cpp file in examples with the help of Joris
so that I can check for framework reregistrations.(the patch is here :
https://gist.github.com/atidix/8d1de11f28744e934496 ).

According to MesosSchedulerDriver, after I pass driver->stop(true), since
failover is True, the scheduler should reregister the framework right? But
what I see happening is that the Framework stops and then registers again.
Am I understanding something wrong or is this how it should go down?

Regards,
Aditi Dixit

PS: Output logs to verify my point

akshay@charizardz:~/mesos/build$ ./src/test-framework --master=
127.0.1.1:5050
I0619 17:50:14.418045 29934 sched.cpp:157] Version: 0.23.0
I0619 17:50:14.429234 29948 sched.cpp:254] New master detected at
master@127.0.1.1:5050
I0619 17:50:14.430210 29948 sched.cpp:264] No credentials provided.
Attempting to register without authentication
I0619 17:50:14.436782 29954 sched.cpp:448] Framework registered with
20150619-163754-16842879-5050-29356-0005
Registered with id20150619-163754-16842879-5050-29356-0005!
Received offer 20150619-163754-16842879-5050-29356-O10 with mem(*):4892;
disk(*):692933; ports(*):[31000-32000]; cpus(*):4
Launching task 0 using offer 20150619-163754-16842879-5050-29356-O10
Launching task 1 using offer 20150619-163754-16842879-5050-29356-O10
Launching task 2 using offer 20150619-163754-16842879-5050-29356-O10
Launching task 3 using offer 20150619-163754-16842879-5050-29356-O10
Task 0 is in state TASK_RUNNING
Task 1 is in state TASK_RUNNING
Task 0 is in state TASK_FINISHED
Task 2 is in state TASK_RUNNING
Task 1 is in state TASK_FINISHED
Task 3 is in state TASK_RUNNING
Task 2 is in state TASK_FINISHED
Task 3 is in state TASK_FINISHED
Received offer 20150619-163754-16842879-5050-29356-O11 with mem(*):4892;
disk(*):692933; ports(*):[31000-32000]; cpus(*):4
Launching task 4 using offer 20150619-163754-16842879-5050-29356-O11
Task 4 is in state TASK_RUNNING
Task 4 is in state TASK_FINISHED
I0619 17:50:15.555902 29950 sched.cpp:1591] Asked to stop the driver



*I0619 17:50:15.555987 29950 sched.cpp:831] Stopping framework
'20150619-163754-16842879-5050-29356-0005'I0619 17:50:15.556049 29934
sched.cpp:1591] Asked to stop the driverI0619 17:50:15.562646 29934
sched.cpp:157] Version: 0.23.0*


*I0619 17:50:15.563300 29948 sched.cpp:254] New master detected
at master@127.0.1.1:5050 <http://master@127.0.1.1:5050/>I0619
17:50:15.563480 29948 sched.cpp:264] No credentials provided. Attempting to
register without authenticationI0619 17:50:15.565598 29948 sched.cpp:448]
Framework registered with
20150619-163754-16842879-5050-29356-0005Registered with
id20150619-163754-16842879-5050-29356-0005!*
Task 4 is in state TASK_FINISHED
Received offer 20150619-163754-16842879-5050-29356-O12 with mem(*):4892;
disk(*):692933; ports(*):[31000-32000]; cpus(*):4
Launching task 0 using offer 20150619-163754-16842879-5050-29356-O12
Launching task 1 using offer 20150619-163754-16842879-5050-29356-O12
Launching task 2 using offer 20150619-163754-16842879-5050-29356-O12
Launching task 3 using offer 20150619-163754-16842879-5050-29356-O12
Task 0 is in state TASK_RUNNING
Task 1 is in state TASK_RUNNING
Task 2 is in state TASK_RUNNING
Task 3 is in state TASK_RUNNING
Task 0 is in state TASK_FINISHED
Task 1 is in state TASK_FINISHED
Task 2 is in state TASK_FINISHED
Task 3 is in state TASK_FINISHED
I0619 17:50:16.592545 29954 sched.cpp:1591] Asked to stop the driver
I0619 17:50:16.592618 29954 sched.cpp:831] Stopping framework
'20150619-163754-16842879-5050-29356-0005'
I0619 17:50:16.592664 29934 sched.cpp:1591] Asked to stop the driver

Re: Help required in doing Reregistration

Posted by haosdent <ha...@gmail.com>.
Hi. @Aditi Accoding to the document:
```
Stops the scheduler driver. If the 'failover' flag is set to false then it
is expected that this framework will never reconnect to Mesos. So Mesos
will unregister the framework and shutdown all its tasks and executors. If
'failover' is true, all executors and tasks will remain running (for some
framework specific failover timeout) allowing the scheduler to reconnect
(possibly in the same process, or from a different process, for example, on
a different machine).
```
I think the behavior of scheduler driver in above log is right.

On Fri, Jun 19, 2015 at 10:03 PM, Aditi Dixit <ad...@gmail.com>
wrote:

> Hi,
> I modified the test_framework.cpp file in examples with the help of Joris
> so that I can check for framework reregistrations.(the patch is here :
> https://gist.github.com/atidix/8d1de11f28744e934496 ).
>
> According to MesosSchedulerDriver, after I pass driver->stop(true), since
> failover is True, the scheduler should reregister the framework right? But
> what I see happening is that the Framework stops and then registers again.
> Am I understanding something wrong or is this how it should go down?
>
> Regards,
> Aditi Dixit
>
> PS: Output logs to verify my point
>
> akshay@charizardz:~/mesos/build$ ./src/test-framework --master=
> 127.0.1.1:5050
> I0619 17:50:14.418045 29934 sched.cpp:157] Version: 0.23.0
> I0619 17:50:14.429234 29948 sched.cpp:254] New master detected at
> master@127.0.1.1:5050
> I0619 17:50:14.430210 29948 sched.cpp:264] No credentials provided.
> Attempting to register without authentication
> I0619 17:50:14.436782 29954 sched.cpp:448] Framework registered with
> 20150619-163754-16842879-5050-29356-0005
> Registered with id20150619-163754-16842879-5050-29356-0005!
> Received offer 20150619-163754-16842879-5050-29356-O10 with mem(*):4892;
> disk(*):692933; ports(*):[31000-32000]; cpus(*):4
> Launching task 0 using offer 20150619-163754-16842879-5050-29356-O10
> Launching task 1 using offer 20150619-163754-16842879-5050-29356-O10
> Launching task 2 using offer 20150619-163754-16842879-5050-29356-O10
> Launching task 3 using offer 20150619-163754-16842879-5050-29356-O10
> Task 0 is in state TASK_RUNNING
> Task 1 is in state TASK_RUNNING
> Task 0 is in state TASK_FINISHED
> Task 2 is in state TASK_RUNNING
> Task 1 is in state TASK_FINISHED
> Task 3 is in state TASK_RUNNING
> Task 2 is in state TASK_FINISHED
> Task 3 is in state TASK_FINISHED
> Received offer 20150619-163754-16842879-5050-29356-O11 with mem(*):4892;
> disk(*):692933; ports(*):[31000-32000]; cpus(*):4
> Launching task 4 using offer 20150619-163754-16842879-5050-29356-O11
> Task 4 is in state TASK_RUNNING
> Task 4 is in state TASK_FINISHED
> I0619 17:50:15.555902 29950 sched.cpp:1591] Asked to stop the driver
>
>
>
> *I0619 17:50:15.555987 29950 sched.cpp:831] Stopping framework
> '20150619-163754-16842879-5050-29356-0005'I0619 17:50:15.556049 29934
> sched.cpp:1591] Asked to stop the driverI0619 17:50:15.562646 29934
> sched.cpp:157] Version: 0.23.0*
>
>
> *I0619 17:50:15.563300 29948 sched.cpp:254] New master detected
> at master@127.0.1.1:5050 <http://master@127.0.1.1:5050/>I0619
> 17:50:15.563480 29948 sched.cpp:264] No credentials provided. Attempting to
> register without authenticationI0619 17:50:15.565598 29948 sched.cpp:448]
> Framework registered with
> 20150619-163754-16842879-5050-29356-0005Registered with
> id20150619-163754-16842879-5050-29356-0005!*
> Task 4 is in state TASK_FINISHED
> Received offer 20150619-163754-16842879-5050-29356-O12 with mem(*):4892;
> disk(*):692933; ports(*):[31000-32000]; cpus(*):4
> Launching task 0 using offer 20150619-163754-16842879-5050-29356-O12
> Launching task 1 using offer 20150619-163754-16842879-5050-29356-O12
> Launching task 2 using offer 20150619-163754-16842879-5050-29356-O12
> Launching task 3 using offer 20150619-163754-16842879-5050-29356-O12
> Task 0 is in state TASK_RUNNING
> Task 1 is in state TASK_RUNNING
> Task 2 is in state TASK_RUNNING
> Task 3 is in state TASK_RUNNING
> Task 0 is in state TASK_FINISHED
> Task 1 is in state TASK_FINISHED
> Task 2 is in state TASK_FINISHED
> Task 3 is in state TASK_FINISHED
> I0619 17:50:16.592545 29954 sched.cpp:1591] Asked to stop the driver
> I0619 17:50:16.592618 29954 sched.cpp:831] Stopping framework
> '20150619-163754-16842879-5050-29356-0005'
> I0619 17:50:16.592664 29934 sched.cpp:1591] Asked to stop the driver
>



-- 
Best Regards,
Haosdent Huang

Re: Help required in doing Reregistration

Posted by haosdent <ha...@gmail.com>.
LoL. If not set, it is zero. Ignore my previous response, sorry. @Aditi

On Sat, Jun 20, 2015 at 1:11 AM, haosdent <ha...@gmail.com> wrote:

> default failover_timeout is zero?
>
> On Sat, Jun 20, 2015 at 1:04 AM, Adam Bordelon <ad...@mesosphere.io> wrote:
>
>> Aha! You also need to set the FrameworkInfo.failover_timeout to a non-zero
>> value. If it is 0, then Mesos will instantly shutdown your framework when
>> the schedulerdriver disconnects.
>>
>> On Fri, Jun 19, 2015 at 9:56 AM, haosdent <ha...@gmail.com> wrote:
>>
>> > From you gist, I think `reregistered` would be call after driver
>> disconnect
>> > to master and connect to master again. Not happend when call stop and
>> start
>> > a new driver with same framework id. You could check out this test case
>> > FaultToleranceTest.FrameworkReregister .
>> >
>> > On Sat, Jun 20, 2015 at 12:31 AM, Vinod Kone <vi...@gmail.com>
>> wrote:
>> >
>> > > Can you send us the gist?
>> > >
>> > > @vinodkone
>> > >
>> > > > On Jun 19, 2015, at 8:49 AM, Aditi Dixit <ad...@gmail.com>
>> > wrote:
>> > > >
>> > > > Hi,
>> > > > Thanks for the responses haosdent and Adam.
>> > > > I'm sorry if I wasn't clear enough, but yes, the patch exactly does
>> > what
>> > > > Adam just mentioned.
>> > > > We create 2 instances of the driver and in the second time, set the
>> > > > frameworkId in the FrameworkInfo to the id from when we register the
>> > > first
>> > > > time, and start the driver again.
>> > > > This should hopefully satisfy all prerequisites. So my question
>> still
>> > > > stands. Thanks in advance.
>> > > > Regards,
>> > > > Aditi Dixit
>> > >
>> >
>> >
>> >
>> > --
>> > Best Regards,
>> > Haosdent Huang
>> >
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>



-- 
Best Regards,
Haosdent Huang

Re: Help required in doing Reregistration

Posted by haosdent <ha...@gmail.com>.
default failover_timeout is zero?

On Sat, Jun 20, 2015 at 1:04 AM, Adam Bordelon <ad...@mesosphere.io> wrote:

> Aha! You also need to set the FrameworkInfo.failover_timeout to a non-zero
> value. If it is 0, then Mesos will instantly shutdown your framework when
> the schedulerdriver disconnects.
>
> On Fri, Jun 19, 2015 at 9:56 AM, haosdent <ha...@gmail.com> wrote:
>
> > From you gist, I think `reregistered` would be call after driver
> disconnect
> > to master and connect to master again. Not happend when call stop and
> start
> > a new driver with same framework id. You could check out this test case
> > FaultToleranceTest.FrameworkReregister .
> >
> > On Sat, Jun 20, 2015 at 12:31 AM, Vinod Kone <vi...@gmail.com>
> wrote:
> >
> > > Can you send us the gist?
> > >
> > > @vinodkone
> > >
> > > > On Jun 19, 2015, at 8:49 AM, Aditi Dixit <ad...@gmail.com>
> > wrote:
> > > >
> > > > Hi,
> > > > Thanks for the responses haosdent and Adam.
> > > > I'm sorry if I wasn't clear enough, but yes, the patch exactly does
> > what
> > > > Adam just mentioned.
> > > > We create 2 instances of the driver and in the second time, set the
> > > > frameworkId in the FrameworkInfo to the id from when we register the
> > > first
> > > > time, and start the driver again.
> > > > This should hopefully satisfy all prerequisites. So my question still
> > > > stands. Thanks in advance.
> > > > Regards,
> > > > Aditi Dixit
> > >
> >
> >
> >
> > --
> > Best Regards,
> > Haosdent Huang
> >
>



-- 
Best Regards,
Haosdent Huang

Re: Help required in doing Reregistration

Posted by Adam Bordelon <ad...@mesosphere.io>.
Aha! You also need to set the FrameworkInfo.failover_timeout to a non-zero
value. If it is 0, then Mesos will instantly shutdown your framework when
the schedulerdriver disconnects.

On Fri, Jun 19, 2015 at 9:56 AM, haosdent <ha...@gmail.com> wrote:

> From you gist, I think `reregistered` would be call after driver disconnect
> to master and connect to master again. Not happend when call stop and start
> a new driver with same framework id. You could check out this test case
> FaultToleranceTest.FrameworkReregister .
>
> On Sat, Jun 20, 2015 at 12:31 AM, Vinod Kone <vi...@gmail.com> wrote:
>
> > Can you send us the gist?
> >
> > @vinodkone
> >
> > > On Jun 19, 2015, at 8:49 AM, Aditi Dixit <ad...@gmail.com>
> wrote:
> > >
> > > Hi,
> > > Thanks for the responses haosdent and Adam.
> > > I'm sorry if I wasn't clear enough, but yes, the patch exactly does
> what
> > > Adam just mentioned.
> > > We create 2 instances of the driver and in the second time, set the
> > > frameworkId in the FrameworkInfo to the id from when we register the
> > first
> > > time, and start the driver again.
> > > This should hopefully satisfy all prerequisites. So my question still
> > > stands. Thanks in advance.
> > > Regards,
> > > Aditi Dixit
> >
>
>
>
> --
> Best Regards,
> Haosdent Huang
>

Re: Help required in doing Reregistration

Posted by haosdent <ha...@gmail.com>.
>From you gist, I think `reregistered` would be call after driver disconnect
to master and connect to master again. Not happend when call stop and start
a new driver with same framework id. You could check out this test case
FaultToleranceTest.FrameworkReregister .

On Sat, Jun 20, 2015 at 12:31 AM, Vinod Kone <vi...@gmail.com> wrote:

> Can you send us the gist?
>
> @vinodkone
>
> > On Jun 19, 2015, at 8:49 AM, Aditi Dixit <ad...@gmail.com> wrote:
> >
> > Hi,
> > Thanks for the responses haosdent and Adam.
> > I'm sorry if I wasn't clear enough, but yes, the patch exactly does what
> > Adam just mentioned.
> > We create 2 instances of the driver and in the second time, set the
> > frameworkId in the FrameworkInfo to the id from when we register the
> first
> > time, and start the driver again.
> > This should hopefully satisfy all prerequisites. So my question still
> > stands. Thanks in advance.
> > Regards,
> > Aditi Dixit
>



-- 
Best Regards,
Haosdent Huang

Re: Help required in doing Reregistration

Posted by Aditi Dixit <ad...@gmail.com>.
Sure. You can find the modified test_framework.cpp here :
https://gist.github.com/atidix/8ebadf460634bbd23d71

On Fri, Jun 19, 2015 at 10:01 PM, Vinod Kone <vi...@gmail.com> wrote:

> Can you send us the gist?
>
> @vinodkone
>
> > On Jun 19, 2015, at 8:49 AM, Aditi Dixit <ad...@gmail.com> wrote:
> >
> > Hi,
> > Thanks for the responses haosdent and Adam.
> > I'm sorry if I wasn't clear enough, but yes, the patch exactly does what
> > Adam just mentioned.
> > We create 2 instances of the driver and in the second time, set the
> > frameworkId in the FrameworkInfo to the id from when we register the
> first
> > time, and start the driver again.
> > This should hopefully satisfy all prerequisites. So my question still
> > stands. Thanks in advance.
> > Regards,
> > Aditi Dixit
>

Re: Help required in doing Reregistration

Posted by Vinod Kone <vi...@gmail.com>.
Can you send us the gist?

@vinodkone

> On Jun 19, 2015, at 8:49 AM, Aditi Dixit <ad...@gmail.com> wrote:
> 
> Hi,
> Thanks for the responses haosdent and Adam.
> I'm sorry if I wasn't clear enough, but yes, the patch exactly does what
> Adam just mentioned.
> We create 2 instances of the driver and in the second time, set the
> frameworkId in the FrameworkInfo to the id from when we register the first
> time, and start the driver again.
> This should hopefully satisfy all prerequisites. So my question still
> stands. Thanks in advance.
> Regards,
> Aditi Dixit

Re: Help required in doing Reregistration

Posted by Aditi Dixit <ad...@gmail.com>.
Hi,
Thanks for the responses haosdent and Adam.
I'm sorry if I wasn't clear enough, but yes, the patch exactly does what
Adam just mentioned.
We create 2 instances of the driver and in the second time, set the
frameworkId in the FrameworkInfo to the id from when we register the first
time, and start the driver again.
This should hopefully satisfy all prerequisites. So my question still
stands. Thanks in advance.
Regards,
Aditi Dixit

Re: Help required in doing Reregistration

Posted by Adam Bordelon <ad...@mesosphere.io>.
Aditi,

driver->stop() will shutdown the schedulerDriver, and passing in 'true'
just means that it doesn't shutdown the framework within Mesos and kill its
tasks. You will have to manually create a new SchedulerDriver for it to
reregister with Mesos, whether you do that within your still-running
scheduler, or by shutting down and restarting the scheduler. Note that you
will need to reregister with the same frameworkId to reconnect to the
running tasks.

On Fri, Jun 19, 2015 at 7:03 AM, Aditi Dixit <ad...@gmail.com> wrote:

> Hi,
> I modified the test_framework.cpp file in examples with the help of Joris
> so that I can check for framework reregistrations.(the patch is here :
> https://gist.github.com/atidix/8d1de11f28744e934496 ).
>
> According to MesosSchedulerDriver, after I pass driver->stop(true), since
> failover is True, the scheduler should reregister the framework right? But
> what I see happening is that the Framework stops and then registers again.
> Am I understanding something wrong or is this how it should go down?
>
> Regards,
> Aditi Dixit
>
> PS: Output logs to verify my point
>
> akshay@charizardz:~/mesos/build$ ./src/test-framework --master=
> 127.0.1.1:5050
> I0619 17:50:14.418045 29934 sched.cpp:157] Version: 0.23.0
> I0619 17:50:14.429234 29948 sched.cpp:254] New master detected at
> master@127.0.1.1:5050
> I0619 17:50:14.430210 29948 sched.cpp:264] No credentials provided.
> Attempting to register without authentication
> I0619 17:50:14.436782 29954 sched.cpp:448] Framework registered with
> 20150619-163754-16842879-5050-29356-0005
> Registered with id20150619-163754-16842879-5050-29356-0005!
> Received offer 20150619-163754-16842879-5050-29356-O10 with mem(*):4892;
> disk(*):692933; ports(*):[31000-32000]; cpus(*):4
> Launching task 0 using offer 20150619-163754-16842879-5050-29356-O10
> Launching task 1 using offer 20150619-163754-16842879-5050-29356-O10
> Launching task 2 using offer 20150619-163754-16842879-5050-29356-O10
> Launching task 3 using offer 20150619-163754-16842879-5050-29356-O10
> Task 0 is in state TASK_RUNNING
> Task 1 is in state TASK_RUNNING
> Task 0 is in state TASK_FINISHED
> Task 2 is in state TASK_RUNNING
> Task 1 is in state TASK_FINISHED
> Task 3 is in state TASK_RUNNING
> Task 2 is in state TASK_FINISHED
> Task 3 is in state TASK_FINISHED
> Received offer 20150619-163754-16842879-5050-29356-O11 with mem(*):4892;
> disk(*):692933; ports(*):[31000-32000]; cpus(*):4
> Launching task 4 using offer 20150619-163754-16842879-5050-29356-O11
> Task 4 is in state TASK_RUNNING
> Task 4 is in state TASK_FINISHED
> I0619 17:50:15.555902 29950 sched.cpp:1591] Asked to stop the driver
>
>
>
> *I0619 17:50:15.555987 29950 sched.cpp:831] Stopping framework
> '20150619-163754-16842879-5050-29356-0005'I0619 17:50:15.556049 29934
> sched.cpp:1591] Asked to stop the driverI0619 17:50:15.562646 29934
> sched.cpp:157] Version: 0.23.0*
>
>
> *I0619 17:50:15.563300 29948 sched.cpp:254] New master detected
> at master@127.0.1.1:5050 <http://master@127.0.1.1:5050/>I0619
> 17:50:15.563480 29948 sched.cpp:264] No credentials provided. Attempting to
> register without authenticationI0619 17:50:15.565598 29948 sched.cpp:448]
> Framework registered with
> 20150619-163754-16842879-5050-29356-0005Registered with
> id20150619-163754-16842879-5050-29356-0005!*
> Task 4 is in state TASK_FINISHED
> Received offer 20150619-163754-16842879-5050-29356-O12 with mem(*):4892;
> disk(*):692933; ports(*):[31000-32000]; cpus(*):4
> Launching task 0 using offer 20150619-163754-16842879-5050-29356-O12
> Launching task 1 using offer 20150619-163754-16842879-5050-29356-O12
> Launching task 2 using offer 20150619-163754-16842879-5050-29356-O12
> Launching task 3 using offer 20150619-163754-16842879-5050-29356-O12
> Task 0 is in state TASK_RUNNING
> Task 1 is in state TASK_RUNNING
> Task 2 is in state TASK_RUNNING
> Task 3 is in state TASK_RUNNING
> Task 0 is in state TASK_FINISHED
> Task 1 is in state TASK_FINISHED
> Task 2 is in state TASK_FINISHED
> Task 3 is in state TASK_FINISHED
> I0619 17:50:16.592545 29954 sched.cpp:1591] Asked to stop the driver
> I0619 17:50:16.592618 29954 sched.cpp:831] Stopping framework
> '20150619-163754-16842879-5050-29356-0005'
> I0619 17:50:16.592664 29934 sched.cpp:1591] Asked to stop the driver
>