You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Gaston Kleiman <ga...@mesosphere.io> on 2018/02/07 19:00:02 UTC

Review Request 65550: Made default executor not shutdown if unsubscribed during task launch.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65550/
-----------------------------------------------------------

Review request for mesos, Anand Mazumdar, Qian Zhang, and Vinod Kone.


Bugs: MESOS-8468
    https://issues.apache.org/jira/browse/MESOS-8468


Repository: mesos


Description
-------

The default executor would unnecessarily shutdown if, while launching a
task group, it gets unsubscribed after having successfully launched the
task group's containers.


Diffs
-----

  src/launcher/default_executor.cpp 4a619859095cc2d30f4806813f64a2e48c83b3ea 


Diff: https://reviews.apache.org/r/65550/diff/1/


Testing
-------

`make check` on GNU/Linux


Thanks,

Gaston Kleiman


Re: Review Request 65550: Made default executor not shutdown if unsubscribed during task launch.

Posted by Gaston Kleiman <ga...@mesosphere.io>.

> On Feb. 9, 2018, 6:36 p.m., Joseph Wu wrote:
> > src/launcher/default_executor.cpp
> > Lines 539-546 (original), 537 (patched)
> > <https://reviews.apache.org/r/65550/diff/1/?file=1954029#file1954029line539>
> >
> >     Since the guard above was removed, this CHECK could potentially be hit now.

Good catch, I'll remove the CHECK.


> On Feb. 9, 2018, 6:36 p.m., Joseph Wu wrote:
> > src/launcher/default_executor.cpp
> > Line 558 (original), 549 (patched)
> > <https://reviews.apache.org/r/65550/diff/1/?file=1954029#file1954029line558>
> >
> >     What happens when the executor is disconnected (as is now allowed) and attempts to launch some health checks?
> >     
> >     Any nested command checks would definitely fail.  But I suppose this is better than shutting down the executor.
> >     
> >     Seems like you need to either delay the creation of the health checks or pause them immediately after creation.

The checker process will treat connection errors as transient failures, and reschedule the check: https://github.com/apache/mesos/blob/a86ff8c36532f97b6eb6b44c6f871de24afbcc4d/src/checks/checker_process.cpp#L531-L538


Transient failures are logged, but not treated as a health check failure:
https://github.com/apache/mesos/blob/a86ff8c36532f97b6eb6b44c6f871de24afbcc4d/src/checks/checker_process.cpp#L353-L356


> On Feb. 9, 2018, 6:36 p.m., Joseph Wu wrote:
> > src/launcher/default_executor.cpp
> > Lines 626-631 (original), 617-622 (patched)
> > <https://reviews.apache.org/r/65550/diff/1/?file=1954029#file1954029line626>
> >
> >     This will be dropped if the executor isn't subscribed.  And as far as I can tell, this status update is not sent in any other location

If the executor isn't subscribed, the status updates will be added to the `unacknowledgedUpdates` map, and sent by `doReliableRegistration()` in the next `SUBSCRIBE` call: 


https://github.com/apache/mesos/blob/a86ff8c36532f97b6eb6b44c6f871de24afbcc4d/src/launcher/default_executor.cpp#L309-L343


The executor doesn't wait for the updates to be ack'd before shutting down (https://github.com/apache/mesos/blob/a86ff8c36532f97b6eb6b44c6f871de24afbcc4d/src/launcher/default_executor.cpp#L1020-L1024), so there's a possibility that these updates will be dropped if the executor is not connected to the agent upon disconnection. This is tracked in https://issues.apache.org/jira/browse/MESOS-8537.


- Gaston


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65550/#review197217
-----------------------------------------------------------


On Feb. 7, 2018, 11 a.m., Gaston Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65550/
> -----------------------------------------------------------
> 
> (Updated Feb. 7, 2018, 11 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-8468
>     https://issues.apache.org/jira/browse/MESOS-8468
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The default executor would unnecessarily shutdown if, while launching a
> task group, it gets unsubscribed after having successfully launched the
> task group's containers.
> 
> 
> Diffs
> -----
> 
>   src/launcher/default_executor.cpp 4a619859095cc2d30f4806813f64a2e48c83b3ea 
> 
> 
> Diff: https://reviews.apache.org/r/65550/diff/1/
> 
> 
> Testing
> -------
> 
> `make check` on GNU/Linux
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>


Re: Review Request 65550: Made default executor not shutdown if unsubscribed during task launch.

Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65550/#review197217
-----------------------------------------------------------




src/launcher/default_executor.cpp
Lines 539-546 (original), 537 (patched)
<https://reviews.apache.org/r/65550/#comment277354>

    Since the guard above was removed, this CHECK could potentially be hit now.



src/launcher/default_executor.cpp
Line 558 (original), 549 (patched)
<https://reviews.apache.org/r/65550/#comment277357>

    What happens when the executor is disconnected (as is now allowed) and attempts to launch some health checks?
    
    Any nested command checks would definitely fail.  But I suppose this is better than shutting down the executor.
    
    Seems like you need to either delay the creation of the health checks or pause them immediately after creation.



src/launcher/default_executor.cpp
Lines 626-631 (original), 617-622 (patched)
<https://reviews.apache.org/r/65550/#comment277359>

    This will be dropped if the executor isn't subscribed.  And as far as I can tell, this status update is not sent in any other location


- Joseph Wu


On Feb. 7, 2018, 11 a.m., Gaston Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65550/
> -----------------------------------------------------------
> 
> (Updated Feb. 7, 2018, 11 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-8468
>     https://issues.apache.org/jira/browse/MESOS-8468
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The default executor would unnecessarily shutdown if, while launching a
> task group, it gets unsubscribed after having successfully launched the
> task group's containers.
> 
> 
> Diffs
> -----
> 
>   src/launcher/default_executor.cpp 4a619859095cc2d30f4806813f64a2e48c83b3ea 
> 
> 
> Diff: https://reviews.apache.org/r/65550/diff/1/
> 
> 
> Testing
> -------
> 
> `make check` on GNU/Linux
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>


Re: Review Request 65550: Made default executor not shutdown if unsubscribed during task launch.

Posted by Qian Zhang <zh...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65550/#review197510
-----------------------------------------------------------


Ship it!




Ship It!

- Qian Zhang


On Feb. 13, 2018, 7:13 a.m., Gaston Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65550/
> -----------------------------------------------------------
> 
> (Updated Feb. 13, 2018, 7:13 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-8468
>     https://issues.apache.org/jira/browse/MESOS-8468
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The default executor would unnecessarily shutdown if, while launching a
> task group, it gets unsubscribed after having successfully launched the
> task group's containers.
> 
> 
> Diffs
> -----
> 
>   src/launcher/default_executor.cpp 4a619859095cc2d30f4806813f64a2e48c83b3ea 
> 
> 
> Diff: https://reviews.apache.org/r/65550/diff/4/
> 
> 
> Testing
> -------
> 
> `make check` on GNU/Linux
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>


Re: Review Request 65550: Made default executor not shutdown if unsubscribed during task launch.

Posted by Gaston Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65550/
-----------------------------------------------------------

(Updated Feb. 12, 2018, 3:13 p.m.)


Review request for mesos, Anand Mazumdar, Qian Zhang, and Vinod Kone.


Changes
-------

Added logging statements.


Bugs: MESOS-8468
    https://issues.apache.org/jira/browse/MESOS-8468


Repository: mesos


Description
-------

The default executor would unnecessarily shutdown if, while launching a
task group, it gets unsubscribed after having successfully launched the
task group's containers.


Diffs (updated)
-----

  src/launcher/default_executor.cpp 4a619859095cc2d30f4806813f64a2e48c83b3ea 


Diff: https://reviews.apache.org/r/65550/diff/3/

Changes: https://reviews.apache.org/r/65550/diff/2-3/


Testing
-------

`make check` on GNU/Linux


Thanks,

Gaston Kleiman


Re: Review Request 65550: Made default executor not shutdown if unsubscribed during task launch.

Posted by Gaston Kleiman <ga...@mesosphere.io>.

> On Feb. 12, 2018, 12:13 p.m., Vinod Kone wrote:
> > src/launcher/default_executor.cpp
> > Line 349 (original), 349 (patched)
> > <https://reviews.apache.org/r/65550/diff/2/?file=1955927#file1955927line349>
> >
> >     Not yours, but I think we can kill this. I think this was added pre multiple task groups support. We should be able to look into the `containers` map AFAICT.

Killed in https://reviews.apache.org/r/65616/


- Gaston


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65550/#review197304
-----------------------------------------------------------


On Feb. 12, 2018, 3:13 p.m., Gaston Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65550/
> -----------------------------------------------------------
> 
> (Updated Feb. 12, 2018, 3:13 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-8468
>     https://issues.apache.org/jira/browse/MESOS-8468
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The default executor would unnecessarily shutdown if, while launching a
> task group, it gets unsubscribed after having successfully launched the
> task group's containers.
> 
> 
> Diffs
> -----
> 
>   src/launcher/default_executor.cpp 4a619859095cc2d30f4806813f64a2e48c83b3ea 
> 
> 
> Diff: https://reviews.apache.org/r/65550/diff/3/
> 
> 
> Testing
> -------
> 
> `make check` on GNU/Linux
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>


Re: Review Request 65550: Made default executor not shutdown if unsubscribed during task launch.

Posted by Vinod Kone <vi...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65550/#review197304
-----------------------------------------------------------


Fix it, then Ship it!





src/launcher/default_executor.cpp
Line 349 (original), 349 (patched)
<https://reviews.apache.org/r/65550/#comment277450>

    Not yours, but I think we can kill this. I think this was added pre multiple task groups support. We should be able to look into the `containers` map AFAICT.



src/launcher/default_executor.cpp
Line 647 (original), 642 (patched)
<https://reviews.apache.org/r/65550/#comment277448>

    Can you log that we are skipping wait in an `else` statement.



src/launcher/default_executor.cpp
Line 651 (original), 647 (patched)
<https://reviews.apache.org/r/65550/#comment277449>

    Can you log here because this gets called from couple different sites?


- Vinod Kone


On Feb. 10, 2018, 6:35 a.m., Gaston Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65550/
> -----------------------------------------------------------
> 
> (Updated Feb. 10, 2018, 6:35 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-8468
>     https://issues.apache.org/jira/browse/MESOS-8468
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The default executor would unnecessarily shutdown if, while launching a
> task group, it gets unsubscribed after having successfully launched the
> task group's containers.
> 
> 
> Diffs
> -----
> 
>   src/launcher/default_executor.cpp 4a619859095cc2d30f4806813f64a2e48c83b3ea 
> 
> 
> Diff: https://reviews.apache.org/r/65550/diff/2/
> 
> 
> Testing
> -------
> 
> `make check` on GNU/Linux
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>


Re: Review Request 65550: Made default executor not shutdown if unsubscribed during task launch.

Posted by Gaston Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65550/
-----------------------------------------------------------

(Updated Feb. 9, 2018, 10:35 p.m.)


Review request for mesos, Anand Mazumdar, Qian Zhang, and Vinod Kone.


Changes
-------

Addressed feedback.


Bugs: MESOS-8468
    https://issues.apache.org/jira/browse/MESOS-8468


Repository: mesos


Description
-------

The default executor would unnecessarily shutdown if, while launching a
task group, it gets unsubscribed after having successfully launched the
task group's containers.


Diffs (updated)
-----

  src/launcher/default_executor.cpp 4a619859095cc2d30f4806813f64a2e48c83b3ea 


Diff: https://reviews.apache.org/r/65550/diff/2/

Changes: https://reviews.apache.org/r/65550/diff/1-2/


Testing
-------

`make check` on GNU/Linux


Thanks,

Gaston Kleiman


Re: Review Request 65550: Made default executor not shutdown if unsubscribed during task launch.

Posted by Gaston Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65550/#review197223
-----------------------------------------------------------




src/launcher/default_executor.cpp
Lines 539-546 (original), 537 (patched)
<https://reviews.apache.org/r/65550/#comment277363>

    Good catch, I'll remove the CHECK.



src/launcher/default_executor.cpp
Line 558 (original), 549 (patched)
<https://reviews.apache.org/r/65550/#comment277362>

    The checker process will treat connection errors as transient failures, and reschedule the check: https://github.com/apache/mesos/blob/a86ff8c36532f97b6eb6b44c6f871de24afbcc4d/src/checks/checker_process.cpp#L531-L538
    
    Transient failures are logged, but not treated as a health check failure:
    https://github.com/apache/mesos/blob/a86ff8c36532f97b6eb6b44c6f871de24afbcc4d/src/checks/checker_process.cpp#L353-L356



src/launcher/default_executor.cpp
Lines 626-631 (original), 617-622 (patched)
<https://reviews.apache.org/r/65550/#comment277361>

    If the executor isn't subscribed, the stauts updates will be added to the `unacknowledgedUpdates` map, and sent by `doReliableRegistration()` in the next `SUBSCRIBE` call: 
    
    https://github.com/apache/mesos/blob/a86ff8c36532f97b6eb6b44c6f871de24afbcc4d/src/launcher/default_executor.cpp#L309-L343
    
    The executor doesn't wait for the updates to be ack'd before shutting down (https://github.com/apache/mesos/blob/a86ff8c36532f97b6eb6b44c6f871de24afbcc4d/src/launcher/default_executor.cpp#L1020-L1024), so there's a possibility that these updates will be dropped if the executor is not connected to the agent upon disconnection. This is tracked in https://issues.apache.org/jira/browse/MESOS-8537.


- Gaston Kleiman


On Feb. 7, 2018, 11 a.m., Gaston Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65550/
> -----------------------------------------------------------
> 
> (Updated Feb. 7, 2018, 11 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-8468
>     https://issues.apache.org/jira/browse/MESOS-8468
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The default executor would unnecessarily shutdown if, while launching a
> task group, it gets unsubscribed after having successfully launched the
> task group's containers.
> 
> 
> Diffs
> -----
> 
>   src/launcher/default_executor.cpp 4a619859095cc2d30f4806813f64a2e48c83b3ea 
> 
> 
> Diff: https://reviews.apache.org/r/65550/diff/1/
> 
> 
> Testing
> -------
> 
> `make check` on GNU/Linux
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>