You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Joseph Wu <jo...@mesosphere.io> on 2019/05/01 02:13:48 UTC

Review Request 70577: Changed Agent V1 GET_STATE for any completed executor's tasks.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70577/
-----------------------------------------------------------

Review request for mesos, Gilbert Song, Greg Mann, and Vinod Kone.


Bugs: MESOS-9750
    https://issues.apache.org/jira/browse/MESOS-9750


Repository: mesos


Description
-------

It is possible for a completed executor to have a non-terminal task
(based on last status update).  For example, during graceful shutdown
of an agent, graceful shutdown of an executor will race with the
agent's shutdown grace period.  If the executor does not send a
TASK_KILLED in time, the agent will still mark the executor as complete
and kill it.

After agent recovery, these completed executors will show up in an
agent's /state and GET_STATE responses.  In GET_STATE however, any
non-terminal tasks will appear under `launched_tasks`.  This may
provide misleading information about the total number of tasks running.

This commit adds extra logic to place these non-terminal tasks under
the `terminated_tasks` category, and adds a regression test.


Diffs
-----

  src/slave/http.cpp 2c4e792d16ad4fd3303760a9db3cba4269152e7d 
  src/tests/api_tests.cpp bc19d7e9661f091aebf3072967548c7c3196239d 


Diff: https://reviews.apache.org/r/70577/diff/1/


Testing
-------

```
make 
src/mesos-tests --gtest_filter="*GetStateWithNonTerminalCompletedTask*" --verbose
```

TODO: more testing


Thanks,

Joseph Wu


Re: Review Request 70577: Changed Agent V1 GET_STATE for any completed executor's tasks.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70577/#review214977
-----------------------------------------------------------



Patch looks great!

Reviews applied: [70577]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers --disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On May 1, 2019, 2:13 a.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70577/
> -----------------------------------------------------------
> 
> (Updated May 1, 2019, 2:13 a.m.)
> 
> 
> Review request for mesos, Gilbert Song, Greg Mann, and Vinod Kone.
> 
> 
> Bugs: MESOS-9750
>     https://issues.apache.org/jira/browse/MESOS-9750
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> It is possible for a completed executor to have a non-terminal task
> (based on last status update).  For example, during graceful shutdown
> of an agent, graceful shutdown of an executor will race with the
> agent's shutdown grace period.  If the executor does not send a
> TASK_KILLED in time, the agent will still mark the executor as complete
> and kill it.
> 
> After agent recovery, these completed executors will show up in an
> agent's /state and GET_STATE responses.  In GET_STATE however, any
> non-terminal tasks will appear under `launched_tasks`.  This may
> provide misleading information about the total number of tasks running.
> 
> This commit adds extra logic to place these non-terminal tasks under
> the `terminated_tasks` category, and adds a regression test.
> 
> 
> Diffs
> -----
> 
>   src/slave/http.cpp 2c4e792d16ad4fd3303760a9db3cba4269152e7d 
>   src/tests/api_tests.cpp bc19d7e9661f091aebf3072967548c7c3196239d 
> 
> 
> Diff: https://reviews.apache.org/r/70577/diff/1/
> 
> 
> Testing
> -------
> 
> ```
> make 
> src/mesos-tests --gtest_filter="*GetStateWithNonTerminalCompletedTask*" --verbose
> ```
> 
> TODO: more testing
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 70577: Changed Agent V1 GET_STATE for any completed executor's tasks.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70577/#review214976
-----------------------------------------------------------



PASS: Mesos patch 70577 was successfully built and tested.

Reviews applied: `['70577']`

All the build artifacts available at: http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/3304/mesos-review-70577

- Mesos Reviewbot Windows


On May 1, 2019, 2:13 a.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70577/
> -----------------------------------------------------------
> 
> (Updated May 1, 2019, 2:13 a.m.)
> 
> 
> Review request for mesos, Gilbert Song, Greg Mann, and Vinod Kone.
> 
> 
> Bugs: MESOS-9750
>     https://issues.apache.org/jira/browse/MESOS-9750
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> It is possible for a completed executor to have a non-terminal task
> (based on last status update).  For example, during graceful shutdown
> of an agent, graceful shutdown of an executor will race with the
> agent's shutdown grace period.  If the executor does not send a
> TASK_KILLED in time, the agent will still mark the executor as complete
> and kill it.
> 
> After agent recovery, these completed executors will show up in an
> agent's /state and GET_STATE responses.  In GET_STATE however, any
> non-terminal tasks will appear under `launched_tasks`.  This may
> provide misleading information about the total number of tasks running.
> 
> This commit adds extra logic to place these non-terminal tasks under
> the `terminated_tasks` category, and adds a regression test.
> 
> 
> Diffs
> -----
> 
>   src/slave/http.cpp 2c4e792d16ad4fd3303760a9db3cba4269152e7d 
>   src/tests/api_tests.cpp bc19d7e9661f091aebf3072967548c7c3196239d 
> 
> 
> Diff: https://reviews.apache.org/r/70577/diff/1/
> 
> 
> Testing
> -------
> 
> ```
> make 
> src/mesos-tests --gtest_filter="*GetStateWithNonTerminalCompletedTask*" --verbose
> ```
> 
> TODO: more testing
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 70577: Changed Agent V1 GET_STATE for any completed executor's tasks.

Posted by Joseph Wu <jo...@mesosphere.io>.

> On April 30, 2019, 9:13 p.m., Benjamin Mahler wrote:
> > Hm.. at first glance it seems odd to put the special logic in the http response handler instead of just having the internal state reflect that the task is terminal (after all.. it is terminated). Cant't the agnet move the tasks to an appropriate terminal state prior to "completing" the executor?
> > 
> > That way, we also will know that the endpoints are accurately reflecting the in-memory state.

Your suggestion leads to a neater change, and I've posted the revision.  The test needed no changes :)


- Joseph


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70577/#review214974
-----------------------------------------------------------


On May 1, 2019, 12:21 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70577/
> -----------------------------------------------------------
> 
> (Updated May 1, 2019, 12:21 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Greg Mann, and Vinod Kone.
> 
> 
> Bugs: MESOS-9750
>     https://issues.apache.org/jira/browse/MESOS-9750
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> It is possible for a completed executor to have a non-terminal task
> (based on last status update).  For example, during graceful shutdown
> of an agent, graceful shutdown of an executor will race with the
> agent's shutdown grace period.  If the executor does not send a
> TASK_KILLED in time, the agent will still mark the executor as complete
> and kill it.
> 
> After agent recovery, these completed executors will show up in an
> agent's /state and GET_STATE responses.  In GET_STATE however, any
> non-terminal tasks will appear under `launched_tasks`.  This may
> provide misleading information about the total number of tasks running.
> 
> This commit adds extra logic to place these non-terminal tasks under
> the `terminated_tasks` category, and adds a regression test.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp 95f05a18c7905d5032de1cd35726ac3a17f0b682 
>   src/tests/api_tests.cpp bc19d7e9661f091aebf3072967548c7c3196239d 
> 
> 
> Diff: https://reviews.apache.org/r/70577/diff/2/
> 
> 
> Testing
> -------
> 
> ```
> make check
> src/mesos-tests --gtest_filter="*GetStateWithNonTerminalCompletedTask*" --verbose
> ```
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 70577: Changed Agent V1 GET_STATE for any completed executor's tasks.

Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70577/#review214974
-----------------------------------------------------------



Hm.. at first glance it seems odd to put the special logic in the http response handler instead of just having the internal state reflect that the task is terminal (after all.. it is terminated). Cant't the agnet move the tasks to an appropriate terminal state prior to "completing" the executor?

That way, we also will know that the endpoints are accurately reflecting the in-memory state.

- Benjamin Mahler


On May 1, 2019, 2:13 a.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70577/
> -----------------------------------------------------------
> 
> (Updated May 1, 2019, 2:13 a.m.)
> 
> 
> Review request for mesos, Gilbert Song, Greg Mann, and Vinod Kone.
> 
> 
> Bugs: MESOS-9750
>     https://issues.apache.org/jira/browse/MESOS-9750
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> It is possible for a completed executor to have a non-terminal task
> (based on last status update).  For example, during graceful shutdown
> of an agent, graceful shutdown of an executor will race with the
> agent's shutdown grace period.  If the executor does not send a
> TASK_KILLED in time, the agent will still mark the executor as complete
> and kill it.
> 
> After agent recovery, these completed executors will show up in an
> agent's /state and GET_STATE responses.  In GET_STATE however, any
> non-terminal tasks will appear under `launched_tasks`.  This may
> provide misleading information about the total number of tasks running.
> 
> This commit adds extra logic to place these non-terminal tasks under
> the `terminated_tasks` category, and adds a regression test.
> 
> 
> Diffs
> -----
> 
>   src/slave/http.cpp 2c4e792d16ad4fd3303760a9db3cba4269152e7d 
>   src/tests/api_tests.cpp bc19d7e9661f091aebf3072967548c7c3196239d 
> 
> 
> Diff: https://reviews.apache.org/r/70577/diff/1/
> 
> 
> Testing
> -------
> 
> ```
> make 
> src/mesos-tests --gtest_filter="*GetStateWithNonTerminalCompletedTask*" --verbose
> ```
> 
> TODO: more testing
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 70577: Changed Agent V1 GET_STATE for any completed executor's tasks.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70577/#review214984
-----------------------------------------------------------



Patch looks great!

Reviews applied: [70577]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers --disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On May 1, 2019, 7:21 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70577/
> -----------------------------------------------------------
> 
> (Updated May 1, 2019, 7:21 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Greg Mann, and Vinod Kone.
> 
> 
> Bugs: MESOS-9750
>     https://issues.apache.org/jira/browse/MESOS-9750
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> It is possible for a completed executor to have a non-terminal task
> (based on last status update).  For example, during graceful shutdown
> of an agent, graceful shutdown of an executor will race with the
> agent's shutdown grace period.  If the executor does not send a
> TASK_KILLED in time, the agent will still mark the executor as complete
> and kill it.
> 
> After agent recovery, these completed executors will show up in an
> agent's /state and GET_STATE responses.  In GET_STATE however, any
> non-terminal tasks will appear under `launched_tasks`.  This may
> provide misleading information about the total number of tasks running.
> 
> This commit adds extra logic to place these non-terminal tasks under
> the `terminated_tasks` category, and adds a regression test.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp 95f05a18c7905d5032de1cd35726ac3a17f0b682 
>   src/tests/api_tests.cpp bc19d7e9661f091aebf3072967548c7c3196239d 
> 
> 
> Diff: https://reviews.apache.org/r/70577/diff/2/
> 
> 
> Testing
> -------
> 
> ```
> make check
> src/mesos-tests --gtest_filter="*GetStateWithNonTerminalCompletedTask*" --verbose
> ```
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 70577: Changed Agent V1 GET_STATE for any completed executor's tasks.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70577/#review215512
-----------------------------------------------------------


Fix it, then Ship it!





src/tests/api_tests.cpp
Lines 6761 (patched)
<https://reviews.apache.org/r/70577/#comment302245>

    How about verifying the latest state of this task?


- Greg Mann


On May 1, 2019, 7:21 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70577/
> -----------------------------------------------------------
> 
> (Updated May 1, 2019, 7:21 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Greg Mann, and Vinod Kone.
> 
> 
> Bugs: MESOS-9750
>     https://issues.apache.org/jira/browse/MESOS-9750
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> It is possible for a completed executor to have a non-terminal task
> (based on last status update).  For example, during graceful shutdown
> of an agent, graceful shutdown of an executor will race with the
> agent's shutdown grace period.  If the executor does not send a
> TASK_KILLED in time, the agent will still mark the executor as complete
> and kill it.
> 
> After agent recovery, these completed executors will show up in an
> agent's /state and GET_STATE responses.  In GET_STATE however, any
> non-terminal tasks will appear under `launched_tasks`.  This may
> provide misleading information about the total number of tasks running.
> 
> This commit adds extra logic to place these non-terminal tasks under
> the `terminated_tasks` category, and adds a regression test.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp 95f05a18c7905d5032de1cd35726ac3a17f0b682 
>   src/tests/api_tests.cpp bc19d7e9661f091aebf3072967548c7c3196239d 
> 
> 
> Diff: https://reviews.apache.org/r/70577/diff/2/
> 
> 
> Testing
> -------
> 
> ```
> make check
> src/mesos-tests --gtest_filter="*GetStateWithNonTerminalCompletedTask*" --verbose
> ```
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 70577: Changed Agent V1 GET_STATE for any completed executor's tasks.

Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70577/
-----------------------------------------------------------

(Updated May 1, 2019, 12:21 p.m.)


Review request for mesos, Gilbert Song, Greg Mann, and Vinod Kone.


Changes
-------

Alternate fix where we shift the in-memory structure instead of changing the V1 API directly.


Bugs: MESOS-9750
    https://issues.apache.org/jira/browse/MESOS-9750


Repository: mesos


Description
-------

It is possible for a completed executor to have a non-terminal task
(based on last status update).  For example, during graceful shutdown
of an agent, graceful shutdown of an executor will race with the
agent's shutdown grace period.  If the executor does not send a
TASK_KILLED in time, the agent will still mark the executor as complete
and kill it.

After agent recovery, these completed executors will show up in an
agent's /state and GET_STATE responses.  In GET_STATE however, any
non-terminal tasks will appear under `launched_tasks`.  This may
provide misleading information about the total number of tasks running.

This commit adds extra logic to place these non-terminal tasks under
the `terminated_tasks` category, and adds a regression test.


Diffs (updated)
-----

  src/slave/slave.cpp 95f05a18c7905d5032de1cd35726ac3a17f0b682 
  src/tests/api_tests.cpp bc19d7e9661f091aebf3072967548c7c3196239d 


Diff: https://reviews.apache.org/r/70577/diff/2/

Changes: https://reviews.apache.org/r/70577/diff/1-2/


Testing (updated)
-------

```
make check
src/mesos-tests --gtest_filter="*GetStateWithNonTerminalCompletedTask*" --verbose
```


Thanks,

Joseph Wu